LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title Establishing the Upper Bound and Inter-judge Agreement of a Verb Classification Task
Authors Merlo Paola (University of Geneva, Department of Linguistics, 2 rue de Candolle, 1211 Genève 4, Switzerland, merlo@lettres.unige.ch)
Stevenson Suzanne (Department of Computer Science, and Center for Cognitive Science, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854-8019 USA, suzanne@cs.rutgers.edu)
Keywords Argument Structure, Automatic Lexical Acquisition, Inter-Judge Agreement, Verb Classification
Session Session EP1 - Evaluation and Written Area
Full Paper 233.ps, 233.pdf
Abstract Detailed knowledge about verbs is critical in many NLP and IR tasks, yet manual determination of such knowledge for large numbers of verbs is difficult, time-consuming and resource intensive. Recent responsesto this problem have attempted to classify verbs automatically, as a first step to automatically build lexical resources. In order to estimate the upper bound of a verb classification task, which appears to be difficult and subject to variability among experts, we investigated the performance of human experts in controlled classification experiments. We report here the results of two experiments—using a forced-choice task and a non-forced choice task—which measure human expert accuracy (compared to a gold standard) in classifying verbs into three pre-defined classes, as well as inter-expert agreement. To preview, we find that the highest expert accuracy is 86.5% agreement with the gold standard, and that inter-expert agreement is not very high (K between .53 and .66). The two experiments show comparable results.