Title |
Active Learning and Crowd-Sourcing for Machine Translation |
Authors |
Vamshi Ambati, Stephan Vogel and Jaime Carbonell |
Abstract |
Large scale parallel data generation for new language pairs requires intensive human effort and availability of experts. It becomes immensely difficult and costly to provide Statistical Machine Translation (SMT) systems for most languages due to the paucity of expert translators to provide parallel data. Even if experts are present, it appears infeasible due to the impending costs. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs. Active learning aims at reducing cost of label acquisition by prioritizing the most informative data for annotation, while crowd-sourcing reduces cost by using the power of the crowds to make do for the lack of expensive language experts. We experiment and compare our active learning strategies with strong baselines and see significant improvements in translation quality. Similarly, our experiments with crowd-sourcing on Mechanical Turk have shown that it is possible to create parallel corpora using non-experts and with sufficient quality assurance, a translation system that is trained using this corpus approaches expert quality. |
Topics |
Machine Translation, SpeechToSpeech Translation, Statistical and machine learning methods, Corpus (creation, annotation, etc.) |
Full paper |
Active Learning and Crowd-Sourcing for Machine Translation |
Slides |
Active Learning and Crowd-Sourcing for Machine Translation |
Bibtex |
@InProceedings{AMBATI10.244,
author = {Vamshi Ambati and Stephan Vogel and Jaime Carbonell}, title = {Active Learning and Crowd-Sourcing for Machine Translation}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |