Title |
Proposal of a very-large-corpus acquisition method by cell-formed registration |
Authors |
Fumiaki Sugaya (ATR Spoken Language Translation Research Laboratories) Toshiyuki Takezawa (ATR Spoken Language Translation Research Laboratories) Genichiro Kikui (ATR Spoken Language Translation Research Laboratories) Seiichi Yamamoto (ATR Spoken Language Translation Research Laboratories) |
Session |
SP1: Speech Resources |
Abstract |
One promising way to improve the performance of a speech translation system is to collect a large volume of data in the target tasks/domains. However, a naïve expansion of the traditional data collection scheme consumes valuable resources. Advanced speech recognition technology can provide a highly accurate recognizer if a machine-friendly speech is permitted. We propose a new data collection scheme that is supported by this speaking style. The preliminary results of data collection show that the proposed scheme has a three-digit efficiency. |
Keywords |
Spoken language, Speech translation, ATR-MATRIX, Paraphrase, Corpus acquisition |
Full Paper |