Title |
The Role of Parallel Corpora in Bilingual Lexicography |
Authors |
Enikő Héja |
Abstract |
This paper describes an approach based on word alignment on parallel corpora, which aims at facilitating the lexicographic work of dictionary building. Although this method has been widely used in the MT community for at least 16 years, as far as we know, it has not been applied to facilitate the creation of bilingual dictionaries for human use. The proposed corpus-driven technique, in particular the exploitation of parallel corpora, proved to be helpful in the creation of such dictionaries for several reasons. Most importantly, a parallel corpus of appropriate size guarantees that the most relevant translations are included in the dictionary. Moreover, based on the translational probabilities it is possible to rank translation candidates, which ensures that the most frequently used translation variants go first within an entry. A further advantage is that all the relevant example sentences from the parallel corpora are easily accessible, thus facilitating the selection of the most appropriate translations from possible translation candidates. Due to these properties the method is particularly apt to enable the production of active or encoding dictionaries. |
Topics |
Lexicon, lexical database, Multilinguality, Endangered languages |
Full paper |
The Role of Parallel Corpora in Bilingual Lexicography |
Slides |
The Role of Parallel Corpora in Bilingual Lexicography |
Bibtex |
@InProceedings{HJA10.559,
author = {Enikő Héja}, title = {The Role of Parallel Corpora in Bilingual Lexicography}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |