Title |
Harvesting Multi-Word Expressions from Parallel Corpora |
Authors |
Špela Vintar and Darja Fišer |
Abstract |
The paper presents a set of approaches to extend the automatically created Slovene wordnet with nominal multi-word expressions. In the first approach multi-word expressions from Princeton WordNet are translated with a technique that is based on word-alignment and lexico-syntactic patterns. This is followed by extracting new terms from a monolingual corpus using keywordness ranking and contextual patterns. Finally, the multi-word expressions are assigned a hypernym and added to our wordnet. Manual evaluation and comparison of the results shows that the translation approach is the most straightforward and accurate. However, it is successfully complemented by the two monolingual approaches which are able to identify more term candidates in the corpus that would otherwise go unnoticed. Some weaknesses of the proposed wordnet extension techniques are also addressed. |
Language |
Multiple languages |
Topics |
Lexicon, lexical database, Semantics, MultiWord Expressions & Collocations |
Full paper |
Harvesting Multi-Word Expressions from Parallel Corpora |
Slides |
- |
Bibtex |
@InProceedings{VINTAR08.281,
author = {Špela Vintar and Darja Fišer},
title = {Harvesting Multi-Word Expressions from Parallel Corpora},
booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
year = {2008},
month = {may},
date = {28-30},
address = {Marrakech, Morocco},
editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
publisher = {European Language Resources Association (ELRA)},
isbn = {2-9517408-4-0},
note = {http://www.lrec-conf.org/proceedings/lrec2008/},
language = {english}
} |