Title |
Clustering of Terms from Translation Dictionaries and Synonyms Lists to Automatically Build more Structured Linguistic Resources |
Authors |
Maria Teresa Pazienza and Armando Stellato |
Abstract |
Building a Linguistic Resource (LR) is a task requiring a huge quantitative of means, human resources and funds. Though finalization of the development phase and assessment of the produced resource, necessarily require human involvement, a computer aided process for building the resources initial structure would greatly reduce the overall effort to be undertaken. We present here a novel approach for automatizing the process of building structured (possibly multilingual) LRs, starting from already available LRs and exploiting simple vocabularies of synonyms and/or translations for different languages. A simple algorithm for clustering terms, according to their shared senses, is presented in two versions, both for separating flat list of synonyms and flat lists of translations. The algorithm is then motivated against two possible exploitations: reducing the cost for producing new LRs, and linguistically enriching the content of existing semantic resources, like SW ontologies and knowledge bases. Empirical results are provided for two experimental setups: automatic term clustering for English synonyms list, and for Italian translations of English terms |
Language |
Multiple languages |
Topics |
Language modelling, Lexicon, lexical database, Validation of LRs |
Full paper |
Clustering of Terms from Translation Dictionaries and Synonyms Lists to Automatically Build more Structured Linguistic Resources |
Slides |
- |
Bibtex |
@InProceedings{PAZIENZA08.629,
author = {Maria Teresa Pazienza and Armando Stellato},
title = {Clustering of Terms from Translation Dictionaries and Synonyms Lists to Automatically Build more Structured Linguistic Resources},
booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
year = {2008},
month = {may},
date = {28-30},
address = {Marrakech, Morocco},
editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
publisher = {European Language Resources Association (ELRA)},
isbn = {2-9517408-4-0},
note = {http://www.lrec-conf.org/proceedings/lrec2008/},
language = {english}
} |