Title

Proposal of a very-large-corpus acquisition method by cell-formed registration

Authors

Fumiaki Sugaya (ATR Spoken Language Translation Research Laboratories)

Toshiyuki Takezawa (ATR Spoken Language Translation Research Laboratories)

Genichiro Kikui (ATR Spoken Language Translation Research Laboratories)

Seiichi Yamamoto (ATR Spoken Language Translation Research Laboratories)

Session

SP1: Speech Resources

Abstract

One promising way to improve the performance of a speech translation system is to collect a large volume of data in the target tasks/domains. However, a naïve expansion of the traditional data collection scheme consumes valuable resources. Advanced speech recognition technology can provide a highly accurate recognizer if a machine-friendly speech is permitted. We propose a new data collection scheme that is supported by this speaking style. The preliminary results of data collection show that the proposed scheme has a three-digit efficiency.

Keywords

Spoken language, Speech translation, ATR-MATRIX, Paraphrase, Corpus acquisition

Full Paper

309.pdf