LREC 2012 Proceedings

Summary of the paper

Title	Addressing polysemy in bilingual lexicon extraction from comparable corpora
Authors	Darja Fišer, Nikola Ljubešić and Ozren Kubelka
Abstract	This paper presents an approach to extract translation equivalents from comparable corpora for polysemous nouns. As opposed to the standard approaches that build a single context vector for all occurrences of a given headword, we first disambiguate the headword with third-party sense taggers and then build a separate context vector for each sense of the headword. Since state-of-the-art word sense disambiguation tools are still far from perfect, we also tried to improve the results by combining the sense assignments provided by two different sense taggers. Evaluation of the results shows that we outperform the baseline (0.473) in all the settings we experimented with, even when using only one sense tagger, and that the best-performing results are indeed obtained by taking into account the intersection of both sense taggers (0.720).
Topics	Lexicon, lexical database, Information Extraction, Information Retrieval, Word Sense Disambiguation
Full paper	Addressing polysemy in bilingual lexicon extraction from comparable corpora
Bibtex	@InProceedings{FIER12.601, author = {Darja Fišer and Nikola Ljubešić and Ozren Kubelka}, title = {Addressing polysemy in bilingual lexicon extraction from comparable corpora}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} }