Title |
CORILGA: a Galician Multilevel Annotated Speech Corpus for Linguistic Analysis |
Authors |
Carmen Garcia-Mateo, Antonio Cardenal, Xose Luis Regueira, Elisa Fernández Rei, Marta Martinez, Roberto Seara, Rocío Varela and Noemí Basanta |
Abstract |
This paper describes the CORILGA (Corpus Oral Informatizado da Lingua Galega). CORILGA is a large high-quality corpus of spoken Galician from the 1960s up to present-day, including both formal and informal spoken language from both standard and non-standard varieties, and across different generations and social levels. The corpus will be available to the research community upon completion. Galician is one of the EU languages that needs further research before highly effective language technology solutions can be implemented. A software repository for speech resources in Galician is also described. The repository includes a structured database, a graphical interface and processing tools. The use of a database enables to perform search in a simple and fast way based in a number of different criteria. The web-based user interface facilitates users the access to the different materials. Last but not least a set of transcription-based modules for automatic speech recognition has been developed, thus facilitating the orthographic labelling of the recordings. |
Topics |
Speech Resource/Database, Corpus (Creation, Annotation, etc.) |
Full paper |
CORILGA: a Galician Multilevel Annotated Speech Corpus for Linguistic Analysis |
Bibtex |
@InProceedings{GARCIAMATEO14.739,
author = {Carmen Garcia-Mateo and Antonio Cardenal and Xose Luis Regueira and Elisa Fernández Rei and Marta Martinez and Roberto Seara and Rocío Varela and Noemí Basanta}, title = {CORILGA: a Galician Multilevel Annotated Speech Corpus for Linguistic Analysis}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |