Title |
Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System |
Authors |
Na-Rae Han, Joel Tetreault, Soo-Hwa Lee and Jin-Young Ha |
Abstract |
This paper presents research on building a model of grammatical error correction, for preposition errors in particular, in English text produced by language learners. Unlike most previous work which trains a statistical classifier exclusively on well-formed text written by native speakers, we train a classifier on a large-scale, error-tagged corpus of English essays written by ESL learners, relying on contextual and grammatical features surrounding preposition usage. First, we show that such a model can achieve high performance values: 93.3% precision and 14.8% recall for error detection and 81.7% precision and 13.2% recall for error detection and correction when tested on preposition replacement errors. Second, we show that this model outperforms models trained on well-edited text produced by native speakers of English. We discuss the implications of our approach in the area of language error modeling and the issues stemming from working with a noisy data set whose error annotations are not exhaustive. |
Topics |
Authoring tools, proofing, Grammar and Syntax, Language modelling |
Full paper |
Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System |
Slides |
Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System |
Bibtex |
@InProceedings{HAN10.821,
author = {Na-Rae Han and Joel Tetreault and Soo-Hwa Lee and Jin-Young Ha}, title = {Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |