Title |
Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems |
Authors |
Kallirroi Georgila, Alan Black, Kenji Sagae and David Traum |
Abstract |
The current practice in virtual human dialogue systems is to use professional human recordings or limited-domain speech synthesis. Both approaches lead to good performance but at a high cost. To determine the best trade-off between performance and cost, we perform a systematic evaluation of human and synthesized voices with regard to naturalness, conversational aspect, and likability. We vary the type (in-domain vs. out-of-domain), length, and content of utterances, and take into account the age and native language of raters as well as their familiarity with speech synthesis. We present detailed results from two studies, a pilot one and one run on Amazon's Mechanical Turk. Our results suggest that a professional human voice can supersede both an amateur human voice and synthesized voices. Also, a high-quality general-purpose voice or a good limited-domain voice can perform better than amateur human recordings. We do not find any significant differences between the performance of a high-quality general-purpose voice and a limited-domain voice, both trained with speech recorded by actors. As expected, the high-quality general-purpose voice is rated higher than the limited-domain voice for out-of-domain sentences and lower for in-domain sentences. There is also a trend for long or negative-content utterances to receive lower ratings. |
Topics |
Speech Synthesis, Dialogue, Evaluation methodologies |
Full paper |
Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems |
Bibtex |
@InProceedings{GEORGILA12.562,
author = {Kallirroi Georgila and Alan Black and Kenji Sagae and David Traum}, title = {Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |