Title | Issues in Annotation of the Czech Spontaneous Speech Corpus in the MALACH Project |
Author(s) |
Josef Psutka (1), Pavel Ircing (1), Jan Hajič (2), Vlasta Radová (1), Josef V. Psutka (1), William J. Byrne (3), Samuel Gustman (4)
(1) Department of Cybernetics and Center for Computational Linguistics, University of West Bohemia, Plzen, Czech Republic; (2) UFAL and Center for Computational Linguistics, Charles University, Praha, Czech Republic; (3) Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, USA; (4) Survivors of the Shoah Visual History Foundation, Los Angeles, CA, USA |
Session | P9-SE |
Abstract | The paper present the issues encountered in processing spontaneous Czech speech in the MALACH project. Specific problems connected with a frequent occurrence of colloquial words in spontaneous Czech are analyzed; a partial solution is proposed and experimentally evaluated. |
Keyword(s) | MALACH project, Czech, corpus annotation, automatic speech recognition, colloquial words |
Language(s) | Czech |
Full Paper | 630.pdf |