Title |
Enhanced Infrastructure for Creation and Collection of Translation Resources |
Authors |
Zhiyi Song, Stephanie Strassel, Gary Krug and Kazuaki Maeda |
Abstract |
Statistical Machine Translation (MT) systems have achieved impressive results in recent years, due in large part to the increasing availability of parallel text for system training and development. This paper describes recent efforts at Linguistic Data Consortium to create linguistic resources for MT, including corpora, specifications and resource infrastructure. We review LDC's three-pronged ap-proach to parallel text corpus development (acquisition of existing parallel text from known repositories, harvesting and aligning of potential parallel documents from the web, and manual creation of parallel text by professional translators), and describe recent adap-tations that have enabled significant expansions in the scope, variety, quality, efficiency and cost-effectiveness of translation resource creation at LDC. |
Topics |
Machine Translation, SpeechToSpeech Translation, Corpus (creation, annotation, etc.), LR Infrastructures and Architectures |
Full paper |
Enhanced Infrastructure for Creation and Collection of Translation Resources |
Slides |
Enhanced Infrastructure for Creation and Collection of Translation Resources |
Bibtex |
@InProceedings{SONG10.798,
author = {Zhiyi Song and Stephanie Strassel and Gary Krug and Kazuaki Maeda}, title = {Enhanced Infrastructure for Creation and Collection of Translation Resources}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |