Title | The Effect of Bias on an Automatically-built Word Sense Corpus |
Author(s) |
David Martinez, Eneko Agirre
IXA Group, University of the Basque Country |
Session | O40-W |
Abstract | The goal of this paper is to explore the large-scale automatic acquisition of sense-tagged examples to be used for Word Sense Disambiguation (WSD). We have applied the ``monosemous relatives'' method on the Web in order to build such a resource for all nouns in WordNet. The analysis of some parameters revealed that the distribution of the word senses (bias) in the training and test corpus is a determinant factor. Provided there is a method to approximate the bias for each word sense, the results we obtained for English are comparable to the use of hand-tagged data (Semcor), which is a very interesting perspective for lesser studied languages. |
Keyword(s) | Word Sense Disambiguation, Automatic Corpus Acquisition, Bootstrapping |
Language(s) | English |
Full Paper | 648.pdf |