Title |
Automatic Translation of Scientific Documents in the HAL Archive |
Authors |
Lambert Patrik, Holger Schwenk and Frédéric Blain |
Abstract |
This paper describes the development of a statistical machine translation system between French and English for scientific papers. This system will be closely integrated into the French HAL open archive, a collection of more than 100.000 scientific papers. We describe the creation of in-domain parallel and monolingual corpora, the development of a domain specific translation system with the created resources, and its adaptation using monolingual resources only. These techniques allowed us to improve a generic system by more than 10 BLEU points. |
Topics |
Machine Translation, SpeechToSpeech Translation, Corpus (creation, annotation, etc.), Statistical and machine learning methods |
Full paper |
Automatic Translation of Scientific Documents in the HAL Archive |
Bibtex |
@InProceedings{PATRIK12.703,
author = {Lambert Patrik and Holger Schwenk and Frédéric Blain}, title = {Automatic Translation of Scientific Documents in the HAL Archive}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |