Summary of the paper

Title Same domain different discourse style - A case study on Language Resources for data-driven Machine Translation
Authors Monica Gavrila, Walther v. Hahn and Cristina Vertan
Abstract Data-driven machine translation (MT) approaches became very popular during last years, especially for language pairs for which it is difficult to find specialists to develop transfer rules. Statistical (SMT) or example-based (EBMT) systems can provide reasonable translation quality for assimilation purposes, as long as a large amount of training data is available. Especially SMT systems rely on parallel aligned corpora which have to be statistical relevant for the given language pair. The construction of large domain specific parallel corpora is time- and cost-consuming; the current practice relies on one or two big such corpora per language pair. Recent developed strategies ensure certain portability to other domains through specialized lexicons or small domain specific corpora. In this paper we discuss the influence of different discourse styles on statistical machine translation systems. We investigate how a pure SMT performs when training and test data belong to same domain but the discourse style varies.
Topics Machine Translation, SpeechToSpeech Translation, Tools, systems, applications, Evaluation methodologies
Full paper Same domain different discourse style - A case study on Language Resources for data-driven Machine Translation
Bibtex @InProceedings{GAVRILA12.1003,
  author = {Monica Gavrila and Walther v. Hahn and Cristina Vertan},
  title = {Same domain different discourse style - A case study on Language Resources for data-driven Machine Translation},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {may},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {english}
 }
Powered by ELDA © 2012 ELDA/ELRA