Title | Detecting Errors in English Article Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus |
Author(s) |
Na-Rae Han (1), Martin Chodorow (2), Claudia Leacock (3)
(1) University of Pennsylvania; (2) Hunter College of the City University of New York; (3) Educational Testing Service |
Session | O38-EW |
Abstract | One of the most difficult challenges faced by non-native speakers of English is mastering the system of English articles. We trained a maximum entropy classifier to select among a/an, the, or zero article for noun phrases, based on a set of features extracted from the local context of each. When the classifier was trained on 6 million noun phrases, its performance was correct about 88% of the time. We also used the classifier to detect article errors in the TOEFL essays of native speakers of Chinese, Japanese, and Russian. Agreement with human annotators was about 88% (kappa = 0.36). Many of the disagreements were due to the classifier's lack of discourse information. Performance rose to 94% agreement (kappa = 0.47) when the system accepted noun phrases as correct in cases where its own confidence was low. |
Keyword(s) | English determiner, English article, second-language learning, second-language teaching, large corpus, maximum entropy |
Language(s) | English |
Full Paper | 695.pdf |