Title |
Bilingual dictionaries for all EU languages |
Authors |
Ahmet Aker, Monica Paramita, Marcis Pinnis and Robert Gaizauskas |
Abstract |
Bilingual dictionaries can be automatically generated using the GIZA++ tool. However, these dictionaries contain a lot of noise, because of which the quality of outputs of tools relying on the dictionaries are negatively affected. In this work we present three different methods for cleaning noise from automatically generated bilingual dictionaries: LLR, pivot and translation based approach. We have applied these approaches on the GIZA++ dictionaries -- dictionaries covering official EU languages -- in order to remove noise. Our evaluation showed that all methods help to reduce noise. However, the best performance is achieved using the transliteration based approach. We provide all bilingual dictionaries (the original GIZA++ dictionaries and the cleaned ones) free for download. We also provide the cleaning tools and scripts for free download. |
Topics |
Lexicon, Lexical Database, Acquisition |
Full paper |
Bilingual dictionaries for all EU languages |
Bibtex |
@InProceedings{AKER14.803,
author = {Ahmet Aker and Monica Paramita and Marcis Pinnis and Robert Gaizauskas}, title = {Bilingual dictionaries for all EU languages}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |