Title |
Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR |
Authors |
Xabier Saralegi and Maddalen Lopez de Lacalle |
Abstract |
This paper deals with the main problems that arise in the query translation process in dictionary-based Cross-lingual Information Retrieval (CLIR): translation selection, presence of Out-Of-Vocabulary (OOV) terms and translation of Multi-Word Expressions (MWE). We analyse to what extent each problem affects the retrieval performance for the Basque-English pair of languages, and the improvement obtained when using parallel corpora free methods to address them. To tackle the translation selection problem we provide novel extensions of an already existing monolingual target co-occurrence-based method, the Out-Of Vocabulary terms are dealt with by means of a cognate detection-based method and finally, for the Multi-Word Expression translation problem, a naïve matching technique is applied. The error analysis shows significant differences in the deterioration of the performance depending on the problem, in terms of Mean Average Precision (MAP), the translation selection problem being the cause of most of the errors. Otherwise, the proposed combined strategy shows a good performance to tackle the three above-mentioned main problems. |
Topics |
Information Extraction, Information Retrieval, Multilinguality, Machine Translation, SpeechToSpeech Translation |
Full paper |
Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR |
Slides |
Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR |
Bibtex |
@InProceedings{SARALEGI10.63,
author = {Xabier Saralegi and Maddalen Lopez de Lacalle}, title = {Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |