Title |
Establishing the Upper Bound and Inter-judge Agreement of a Verb Classification Task |
Authors |
Merlo Paola (University of Geneva, Department of Linguistics, 2 rue de Candolle, 1211 Genève 4, Switzerland, merlo@lettres.unige.ch) Stevenson Suzanne (Department of Computer Science, and Center for Cognitive Science, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854-8019 USA, suzanne@cs.rutgers.edu) |
Keywords |
Argument Structure, Automatic Lexical Acquisition, Inter-Judge Agreement, Verb Classification |
Session |
Session EP1 - Evaluation and Written Area |
Full Paper |
233.ps, 233.pdf |
Abstract |
Detailed knowledge about verbs is critical in many NLP and IR tasks, yet manual determination of such knowledge for large numbers of verbs is difficult, time-consuming and resource intensive. Recent responsesto this problem have attempted to classify verbs automatically, as a first step to automatically build lexical resources. In order to estimate the upper bound of a verb classification task, which appears to be difficult and subject to variability among experts, we investigated the performance of human experts in controlled classification experiments. We report here the results of two experiments—using a forced-choice task and a non-forced choice task—which measure human expert accuracy (compared to a gold standard) in classifying verbs into three pre-defined classes, as well as inter-expert agreement. To preview, we find that the highest expert accuracy is 86.5% agreement with the gold standard, and that inter-expert agreement is not very high (K between .53 and .66). The two experiments show comparable results. |