Summary of the paper

Title Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System
Authors Kwanchiva Saykham, Ananlada Chotimongkol and Chai Wutiwiwatchai
Abstract This paper investigates the effectiveness of online temporal language model adaptation when applied to a Thai broadcast news transcription task. Our adaptation scheme works as follow: first an initial language model is trained with broadcast news transcription available during the development period. Then the language model is adapted over time with more recent broadcast news transcription and online news articles available during deployment especially the data from the same time period as the broadcast news speech being recognized. We found that the data that are closer in time are more similar in terms of perplexity and are more suitable for language model adaptation. The LMs that are adapted over time with more recent news data are better, both in terms of perplexity and WER, than the static LM trained from only the initial set of broadcast news data. Adaptation data from broadcast news transcription improved perplexity by 38.3% and WER by 7.1% relatively. Though, online news articles achieved less improvement, it is still a useful resource as it can be obtained automatically. Better data pre-processing techniques and data selection techniques based on text similarity could be applied to the news articles to obtain further improvement from this promising result.
Topics Language modelling, Speech Recognition/Understanding, Tools, systems, applications
Full paper Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System
Slides Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System
Bibtex @InProceedings{SAYKHAM10.249,
  author = {Kwanchiva Saykham and Ananlada Chotimongkol and Chai Wutiwiwatchai},
  title = {Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System},
  booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA