Title | Collecting and Sharing Bilingual Spontaneous Speech Corpora: the ChinFaDial Experiment |
Author(s) |
Georges Fafiotte (1), Christian Boitet (1), Mark Seligman (1), Chengqing
Zong (2)
(1) GETA, CLIPS, IMAG-campus (UJF - Grenoble 1), 385 rue de la Bibliothèque, BP 53, F-38041 Grenoble cedex 9, France, georges.fafiotte@imag.fr, christian.boitet@imag.fr, mark.seligman@spokentranslation.com; (2) National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, P.O.Box 2728 Beijing 100080, China, cqzong@nlpr.ia.ac.cn |
Session | P9-SE |
Abstract | We describe here the three main platforms in the ERIM family of Web-based environments for human interpreting, two of them in more details – ERIM-Interp and ERIM-Collect –, then ERIM-Aid. Each platform supports an aspect of the collecting or study of spontaneous bilingual dialogues, translated by an interpreter. ERIM-Interp is the core environment, providing mediated communication between speakers and human interpreters over the network. Using ERIM-Collect, French-Chinese interpreting data have been collected within the three-year "ChinFaDial" project supported by LIAMA, a French-Chinese laboratory in Beijing. These "raw" speech data will be made available in the spring of 2004 on an open-access basis, using the DistribDial server, on a CLIPS-GETA website. Our goal is to extend such corpora, on a collaborative scheme, to allow other research groups to contribute to the site whatever annotations they may have created, and to share them under the same conditions (GPL). An ERIM-Aid variant is intended to provide focused machine aids to human interpreters working over the Web, or possibly to distant monolingual speakers conversing in different languges. |
Keyword(s) | Data collection, spontaneous speech, dialogue, speech corpora, interpreter, interpreting, free distribution, freeware |
Language(s) | Any language, for the generic software resource; French-Chinese, for the ChinFaDial corpora |
Full Paper | 715.pdf |