Summary of the paper

Title RUNDKAST: an Annotated Norwegian Broadcast News Speech Corpus
Authors Ingunn Amdal, Ole Morten Strand, Jørn Almberg and Torbjørn Svendsen
Abstract This paper describes the Norwegian broadcast news speech corpus RUNDKAST. The corpus contains recordings of approximately 77 hours of broadcast news shows from the Norwegian broadcasting company NRK. The corpus covers both read and spontaneous speech as well as spontaneous dialogues and multipart discussions, including frequent occurrences of non-speech material (e.g. music, jingles). The recordings have large variations in speaking styles, dialect use and recording/transmission quality. RUNDKAST has been annotated for research in speech technology. The entire corpus has been manually segmented and transcribed using hierarchical levels. A subset of one hour of read and spontaneous speech from 10 different speakers has been manually annotated using broad phonetic labels. We provide a description of the database content, the annotation tools and strategies, and the conventions used for the different levels of annotation. A corpus of this kind has up to this point not been available for Norwegian, but is considered a necessary part of the infrastructure for language technology research in Norway. The RUNDKAST corpus is planned to be included in a future national Norwegian language resource bank.
Language Single language
Topics Corpus (creation, annotation, etc.), Speech resource/database, Phonetic Databases, Phonology
Full paper RUNDKAST: an Annotated Norwegian Broadcast News Speech Corpus
Slides RUNDKAST: an Annotated Norwegian Broadcast News Speech Corpus
Bibtex @InProceedings{AMDAL08.486,
  author = {Ingunn Amdal, Ole Morten Strand, Jørn Almberg and Torbjørn Svendsen},
  title = {RUNDKAST: an Annotated Norwegian Broadcast News Speech Corpus},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {http://www.lrec-conf.org/proceedings/lrec2008/},
  language = {english}
  }

Powered by ELDA © 2008 ELDA/ELRA