LREC 2008 Proceedings

Summary of the paper

Title	Quick Rich Transcriptions of Arabic Broadcast News Speech Data
Authors	Chomicha Bendahman, Meghan Glenn, Djamel Mostefa, Niklas Paulsson and Stephanie Strassel
Abstract	This paper describes the collect and transcription of a large set of Arabic broadcast news speech data. A total of more than 2000 hours of data was transcribed. The transcription factor for transcribing the broadcast news data has been reduced using a method such as Quick Rich Transcription (QRTR) as well as reducing the number of quality controls performed on the data. The data was collected from several Arabic TV and radio sources and from both Modern Standard Arabic and dialectal Arabic. The orthographic transcriptions included segmentation, speaker turns, topics, sentence unit types and a minimal noise mark-up. The transcripts were produced as a part of the GALE project.
Language
Topics	Corpus (creation, annotation, etc.)
Full paper	Quick Rich Transcriptions of Arabic Broadcast News Speech Data
Slides	Quick Rich Transcriptions of Arabic Broadcast News Speech Data
Bibtex	@InProceedings{BENDAHMAN08.915, author = {Chomicha Bendahman, Meghan Glenn, Djamel Mostefa, Niklas Paulsson and Stephanie Strassel}, title = {Quick Rich Transcriptions of Arabic Broadcast News Speech Data}, booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)}, year = {2008}, month = {may}, date = {28-30}, address = {Marrakech, Morocco}, editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-4-0}, note = {http://www.lrec-conf.org/proceedings/lrec2008/}, language = {english} }