LREC 2010 Proceedings

Summary of the paper

Title	Enhanced Infrastructure for Creation and Collection of Translation Resources
Authors	Zhiyi Song, Stephanie Strassel, Gary Krug and Kazuaki Maeda
Abstract	Statistical Machine Translation (MT) systems have achieved impressive results in recent years, due in large part to the increasing availability of parallel text for system training and development. This paper describes recent efforts at Linguistic Data Consortium to create linguistic resources for MT, including corpora, specifications and resource infrastructure. We review LDC's three-pronged ap-proach to parallel text corpus development (acquisition of existing parallel text from known repositories, harvesting and aligning of potential parallel documents from the web, and manual creation of parallel text by professional translators), and describe recent adap-tations that have enabled significant expansions in the scope, variety, quality, efficiency and cost-effectiveness of translation resource creation at LDC.
Topics	Machine Translation, SpeechToSpeech Translation, Corpus (creation, annotation, etc.), LR Infrastructures and Architectures
Full paper	Enhanced Infrastructure for Creation and Collection of Translation Resources
Slides	Enhanced Infrastructure for Creation and Collection of Translation Resources
Bibtex	@InProceedings{SONG10.798, author = {Zhiyi Song and Stephanie Strassel and Gary Krug and Kazuaki Maeda}, title = {Enhanced Infrastructure for Creation and Collection of Translation Resources}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }