SUMMARY : Session P30-S
Title | Exploiting Linguistic Knowledge in Language Modeling of Czech Spontaneous Speech |
---|---|
Authors | P. Ircing, J. Hoidekr, J. Psutka |
Abstract | In our paper, we present a method for incorporating available linguistic information into a statistical language model that is used in ASR system for transcribing spontaneous speech. We employ the class-based language model paradigm and use the morphological tags as the basis for world-to-class mapping. Since the number of different tags is at least by one order of magnitude lower than the number of words even in the tasks with moderately-sized vocabularies, the tag-based model can be rather robustly estimated using even the relatively small text corpora. Unfortunately, this robustness goes hand in hand with restricted predictive ability of the class-based model. Hence we apply the two-pass recognition strategy, where the first pass is performed with the standard word-based n-gram and the resulting lattices are rescored in the second pass using the aforementioned class-based model. Using this decoding scenario, we have managed to moderately improve the word error rate in the performed ASR experiments. |
Keywords | Speech recognition; language modeling; class-based language models |
Full paper | Exploiting Linguistic Knowledge in Language Modeling of Czech Spontaneous Speech |