Title |
A Multi-Genre SMT System for Arabic to French |
Authors |
Saša Hasan and Hermann Ney |
Abstract |
This work presents improvements of a large-scale Arabic to French statistical machine translation system over a period of three years. The development includes better preprocessing, more training data, additional genre-specific tuning for different domains, namely newswire text and broadcast news transcripts, and improved domain-dependent language models. Starting with an early prototype in 2005 that participated in the second CESTA evaluation, the system was further upgraded to achieve favorable BLEU scores of 44.8% for the text and 41.1% for the audio setting. These results are compared to a system based on the freely available Moses toolkit. We show significant gains both in terms of translation quality (up to +1.2% BLEU absolute) and translation speed (up to 16 times faster) for comparable configuration settings. |
Language |
Multiple languages |
Topics |
Machine Translation, SpeechToSpeech Translation, Corpus (creation, annotation, etc.), Tools, systems, applications |
Full paper |
A Multi-Genre SMT System for Arabic to French |
Slides |
A Multi-Genre SMT System for Arabic to French |
Bibtex |
@InProceedings{HASAN08.549,
author = {Saša Hasan and Hermann Ney},
title = {A Multi-Genre SMT System for Arabic to French},
booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
year = {2008},
month = {may},
date = {28-30},
address = {Marrakech, Morocco},
editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
publisher = {European Language Resources Association (ELRA)},
isbn = {2-9517408-4-0},
note = {http://www.lrec-conf.org/proceedings/lrec2008/},
language = {english}
} |