Summary of the paper

Title Producing a Test Collection for Patent Machine Translation in the Seventh NTCIR Workshop
Authors Atsushi Fujii, Masao Utiyama, Mikio Yamamoto and Takehito Utsuro
Abstract In aiming at research and development on machine translation, we produced a test collection for Japanese-English machine translation in the seventh NTCIR Workshop. This paper describes details of our test collection. From patent documents published in Japan and the United States, we extracted patent families as a parallel corpus. A patent family is a set of patent documents for the same or related invention and these documents are usually filed to more than one country in different languages. In the parallel corpus, we aligned Japanese sentences with their counterpart English sentences. Our test collection, which includes approximately 2,000,000 sentence pairs, can be used to train and test machine translation systems. Our test collection also includes search topics for cross-lingual patent retrieval and the contribution of machine translation to a patent retrieval task can also be evaluated. Our test collection will be available to the public for research purposes after the NTCIR final meeting.
Language
Topics Machine Translation, SpeechToSpeech Translation, Evaluation methodologies, Corpus (creation, annotation, etc.)
Full paper Producing a Test Collection for Patent Machine Translation in the Seventh NTCIR Workshop
Slides -
Bibtex @InProceedings{FUJII08.458,
  author = {Atsushi Fujii, Masao Utiyama, Mikio Yamamoto and Takehito Utsuro},
  title = {Producing a Test Collection for Patent Machine Translation in the Seventh NTCIR Workshop},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {http://www.lrec-conf.org/proceedings/lrec2008/},
  language = {english}
  }

Powered by ELDA © 2008 ELDA/ELRA