Title |
Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach |
Authors |
Vera Cabarrão, Helena Moniz, Fernando Batista, Ricardo Ribeiro, Nuno Mamede, Hugo Meinedo, Isabel Trancoso, Ana Isabel Mata and David Martins De Matos |
Abstract |
This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains. |
Topics |
Language Modelling, Metadata |
Full paper |
Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach |
Bibtex |
@InProceedings{CABARRO14.1020,
author = {Vera Cabarrão and Helena Moniz and Fernando Batista and Ricardo Ribeiro and Nuno Mamede and Hugo Meinedo and Isabel Trancoso and Ana Isabel Mata and David Martins De Matos}, title = {Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |