Title |
Comparative Evaluation of Collocation Extraction Metrics |
Authors |
Aristomenis Thanopoulos (Wire Communications Laboratory, Electrical & Computer Engineering Dept., University of Patras 265 00 Rion, Patras, Greece) Nikos Fakotakis (Wire Communications Laboratory, Electrical & Computer Engineering Dept., University of Patras 265 00 Rion, Patras, Greece) George Kokkinakis (Wire Communications Laboratory, Electrical & Computer Engineering Dept., University of Patras 265 00 Rion, Patras, Greece) |
Session |
EP1: Evaluation |
Abstract |
Corpus-based automatic extraction of collocations is typically carried out employing some statistic indicating concurrency in order to identify words that co-occur more often than expected by chance. In this paper we are concerned with some typical measures such as the t-score, Pearson’s X-square test, log-likelihood ratio, pointwise mutual information and a novel information theoretic measure, namely mutual dependency. Apart from some theoretical discussion about their correlation, we perform comparative evaluation experiments judging performance by their ability to identify lexically associated bigrams. We use two different gold standards: WordNet and lists of named-entities. Besides discovering that a frequency-biased version of mutual dependency performs the best, followed close by likelihood ratio, we point out some implications that usage of available electronic dictionaries such as the WordNet for evaluation of collocation extraction encompasses. |
Keywords |
Collocation extraction, Automatic evaluation, WordNet |
Full Paper |