LREC 2016 Proceedings

Summary of the paper

Title	A Comparative Analysis of Crowdsourced Natural Language Corpora for Spoken Dialog Systems
Authors	Patricia Braunger, Hansjörg Hofmann, Steffen Werner and Maria Schmidt
Abstract	Recent spoken dialog systems have been able to recognize freely spoken user input in restricted domains thanks to statistical methods in the automatic speech recognition. These methods require a high number of natural language utterances to train the speech recognition engine and to assess the quality of the system. Since human speech offers many variants associated with a single intent, a high number of user utterances have to be elicited. Developers are therefore turning to crowdsourcing to collect this data. This paper compares three different methods to elicit multiple utterances for given semantics via crowd sourcing, namely with pictures, with text and with semantic entities. Specifically, we compare the methods with regard to the number of valid data and linguistic variance, whereby a quantitative and qualitative approach is proposed. In our study, the method with text led to a high variance in the utterances and a relatively low rate of invalid data.
Topics	Crowdsourcing, Corpus (Creation, Annotation, etc.), Speech Recognition/Understanding
Full paper	A Comparative Analysis of Crowdsourced Natural Language Corpora for Spoken Dialog Systems
Bibtex	@InProceedings{BRAUNGER16.333, author = {Patricia Braunger and Hansjörg Hofmann and Steffen Werner and Maria Schmidt}, title = {A Comparative Analysis of Crowdsourced Natural Language Corpora for Spoken Dialog Systems}, booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)}, year = {2016}, month = {may}, date = {23-28}, location = {Portorož, Slovenia}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Sara Goggi and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Helene Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {978-2-9517408-9-1}, language = {english} }