SUMMARY : Session O4-S Speech Corpora and Dialogue

 

Title Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News
Authors S. Galliano, E. Geoffrois, G. Gravier, J. Bonastre, D. Mostefa, K. Choukri
Abstract This paper presents the audio corpus developed in the framework of the ESTER evaluation campaign of French broadcast news transcription systems. This corpus includes 100 hours of manually annotated recordings and 1,677 hours of non transcribed data. The manual annotations include the detailed verbatim orthographic transcription, the speaker turns and identities, information about acoustic conditions, and name entities. Additional resources generated by automatic speech processing systems, such as phonetic alignments and word graphs, are also described.
Keywords corpus, transcription, broadcast news, evaluation
Full paper Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News