LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title The ISLE Corpus of Non-Native Spoken English
Authors Menzel Wolfgang (Universität Hamburg, Fachbereich Informatik, Vogt-Kölln-Strasse 30, 22527 Hamburg, Germany, menzel@informatik.uni-hamburg.de)
Atwell Eric (School of Computer Studies, University of Leeds, Woodhouse Lane, Leeds LS2 9JT, United Kingdom, eric@scs.leeds.ac.uk)
Bonaventura Patrizia (Universität Hamburg, Fachbereich Informatik, Vogt-Kölln-Strasse 30, 22527 Hamburg, Germany, pbonaven@informatik.uni-hamburg.de)
Herron Daniel (Universitat Hamburg, Fachbereich Informatik, Vogt-Kolln-Strasse 30, 22527 Hamburg, Germany, herron@informatik.uni-hamburg.de)
Howarth Peter (University of Leeds, Woodhouse Lane, Leeds LS2 9JT, Great Britain, p.a.howarth@leeds.ac.uk)
Morton Rachel (Entropic Cambridge Research Labs, Compass House, 80-82 Newmarket Road, Cambridge, CB1 4LD, Great Britain, rim@entropic.co.uk)
Souter Clive (School of Computer Studies, University of Leeds, Woodhouse Lane, Leeds LS2 9JT, United Kingdom, cs@scs.leeds.ac.uk)
Keywords Non-Native Speech, Pronunciation Training, Speech Corpus Annotation, Speech Corpus Design, Speech Recognition
Session Session SP3 - Spoken Language Resources' Projects
Full Paper 313.ps, 313.pdf
Abstract For the purpose of developing pronunciation training tools for second language learning a corpus of non-native speech data has been collected, which consists of almost 18 hours of annotated speech signals spoken by Italian and German learners of English. The corpus is based on 250 utterances selected from typical second language learning exercises. It has been annotated at the word and the phone level, to highlight pronunciation errors such as phone realisation problems and misplaced word stress assignments. The data has been used to develop and evaluate several diagnostic components, which can be used to produce corrective feedback of unprecedented detail to a language learner.