Title |
Semi-Automatic Annotation of the UCU Accents Speech Corpus |
Authors |
Rosemary Orr, Marijn Huijbregts, Roeland Van Beek, Lisa Teunissen, Kate Backhouse and David Van Leeuwen |
Abstract |
Annotation and labeling of speech tasks in large multitask speech corpora is a necessary part of preparing a corpus for distribution. We address three approaches to annotation and labeling: manual, semi automatic and automatic procedures for labeling the UCU Accent Project speech data, a multilingual multitask longitudinal speech corpus. Accuracy and minimal time investment are the priorities in assessing the efficacy of each procedure. While manual labeling based on aural and visual input should produce the most accurate results, this approach is error-prone because of its repetitive nature. A semi automatic event detection system requiring manual rejection of false alarms and location and labeling of misses provided the best results. A fully automatic system could not be applied to entire speech recordings because of the variety of tasks and genres. However, it could be used to annotate separate sentences within a specific task. Acoustic confidence measures can correctly detect sentences that do not match the text with an EER of 3.3% |
Topics |
Speech Resource/Database |
Full paper |
Semi-Automatic Annotation of the UCU Accents Speech Corpus |
Bibtex |
@InProceedings{ORR14.511,
author = {Rosemary Orr and Marijn Huijbregts and Roeland Van Beek and Lisa Teunissen and Kate Backhouse and David Van Leeuwen}, title = {Semi-Automatic Annotation of the UCU Accents Speech Corpus}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |