Title |
Bilingual Dictionary Construction with Transliteration Filtering |
Authors |
John Richardson, Toshiaki Nakazawa and Sadao Kurohashi |
Abstract |
In this paper we present a bilingual transliteration lexicon of 170K Japanese-English technical terms in the scientific domain. Translation pairs are extracted by filtering a large list of transliteration candidates generated automatically from a phrase table trained on parallel corpora. Filtering uses a novel transliteration similarity measure based on a discriminative phrase-based machine translation approach. We demonstrate that the extracted dictionary is accurate and of high recall (F1 score 0.8). Our lexicon contains not only single words but also multi-word expressions, and is freely available. Our experiments focus on Katakana-English lexicon construction, however it would be possible to apply the proposed methods to transliteration extraction for a variety of language pairs. |
Topics |
Machine Translation, SpeechToSpeech Translation, Other |
Full paper |
Bilingual Dictionary Construction with Transliteration Filtering |
Bibtex |
@InProceedings{RICHARDSON14.102,
author = {John Richardson and Toshiaki Nakazawa and Sadao Kurohashi}, title = {Bilingual Dictionary Construction with Transliteration Filtering}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |