Title |
Collection of a Simultaneous Translation Corpus for Comparative Analysis |
Authors |
Hiroaki Shimizu, Graham Neubig, Sakriani Sakti, Tomoki Toda and Satoshi Nakamura |
Abstract |
This paper describes the collection of an English-Japanese/Japanese-English simultaneous interpretation corpus. There are two main features of the corpus. The first is that professional simultaneous interpreters with different amounts of experience cooperated with the collection. By comparing data from simultaneous interpretation of each interpreter, it is possible to compare better interpretations to those that are not as good. The second is that for part of our corpus there are already translation data available. This makes it possible to compare translation data with simultaneous interpretation data. We recorded the interpretations of lectures and news, and created time-aligned transcriptions. A total of 387k words of transcribed data were collected. The corpus will be helpful to analyze differences in interpretations styles and to construct simultaneous interpretation systems. |
Topics |
Machine Translation, SpeechToSpeech Translation, Speech Resource/Database |
Full paper |
Collection of a Simultaneous Translation Corpus for Comparative Analysis |
Bibtex |
@InProceedings{SHIMIZU14.162,
author = {Hiroaki Shimizu and Graham Neubig and Sakriani Sakti and Tomoki Toda and Satoshi Nakamura}, title = {Collection of a Simultaneous Translation Corpus for Comparative Analysis}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |