Title |
Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT |
Authors |
Jesús González-Rubio, Jorge Civera, Alfons Juan and Francisco Casacuberta |
Abstract |
Currently, a great effort is being carried out in the digitalisation of large historical document collections for preservation purposes. The documents in these collections are usually written in ancient languages, such as Latin or Greek, which limits the access of the general public to their content due to the language barrier. Therefore, digital libraries aim not only at storing raw images of digitalised documents, but also to annotate them with their corresponding text transcriptions and translations into modern languages. Unfortunately, ancient languages have at their disposal scarce electronic resources to be exploited by natural language processing techniques. This paper describes the compilation process of a novel Latin-Catalan parallel corpus as a new task for statistical machine translation (SMT). Preliminary experimental results are also reported using a state-of-the-art phrase-based SMT system. The results presented in this work reveal the complexity of the task and its challenging, but interesting nature for future development. |
Topics |
Corpus (creation, annotation, etc.), Machine Translation, SpeechToSpeech Translation, Statistical and machine learning methods |
Full paper |
Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT |
Slides |
- |
Bibtex |
@InProceedings{GONZLEZRUBIO10.541,
author = {Jesús González-Rubio and Jorge Civera and Alfons Juan and Francisco Casacuberta}, title = {Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |