Summary of the paper

Title DiSCo - A German Evaluation Corpus for Challenging Problems in the Broadcast Domain
Authors Doris Baum, Daniel Schneider, Rolf Bardeli, Jochen Schwenninger, Barbara Samlowski, Thomas Winkler and Joachim Köhler
Abstract Typical broadcast material contains not only studio-recorded texts read by trained speakers, but also spontaneous and dialect speech, debates with cross-talk, voice-overs, and on-site reports with difficult acoustic environments. Standard approaches to speech and speaker recognition usually deteriorate under such conditions. This paper reports on the design, construction, and experimental analysis of DiSCo, a German corpus for the evaluation of speech and speaker recognition on challenging material from the broadcast domain. One of the key requirements for the design of this corpus was a good coverage of different types of serious programmes beyond clean speech and planned speech broadcast news. Corpus annotation encompasses manual segmentation, an orthographic transcription, and labelling with speech mode, dialect, and noise type. We indicate typical use cases for the corpus by reporting results from ASR, speech search, and speaker recognition on the new corpus, thereby obtaining insights into the difficulty of audio recognition on the various classes.
Topics Corpus (creation, annotation, etc.), Speech Recognition/Understanding, Speech resource/database
Full paper DiSCo - A German Evaluation Corpus for Challenging Problems in the Broadcast Domain
Slides -
Bibtex @InProceedings{BAUM10.355,
  author = {Doris Baum and Daniel Schneider and Rolf Bardeli and Jochen Schwenninger and Barbara Samlowski and Thomas Winkler and Joachim Köhler},
  title = {DiSCo - A German Evaluation Corpus for Challenging Problems in the Broadcast Domain},
  booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA