Summary of the paper

Title Resource Creation for Training and Testing of Transliteration Systems for Indian Languages
Authors Sowmya V. B., Monojit Choudhury, Kalika Bali, Tirthankar Dasgupta and Anupam Basu
Abstract Machine transliteration is used in a number of NLP applications ranging from machine translation and information retrieval to input mechanisms for non-roman scripts. Many popular Input Method Editors for Indian languages, like Baraha, Akshara, Quillpad etc, use back-transliteration as a mechanism to allow users to input text in a number of Indian language. The lack of a standard dataset to evaluate these systems makes it difficult to make any meaningful comparisons of their relative accuracies. In this paper, we describe the methodology for the creation of a dataset of ~2500 transliterated sentence pairs each in Bangla, Hindi and Telugu. The data was collected across three different modes from a total of 60 users. We believe that this dataset will prove useful not only for the evaluation and training of back-transliteration systems but also help in the linguistic analysis of the process of transliterating Indian languages from native scripts to Roman.
Topics Corpus (creation, annotation, etc.), Other
Full paper Resource Creation for Training and Testing of Transliteration Systems for Indian Languages
Slides Resource Creation for Training and Testing of Transliteration Systems for Indian Languages
Bibtex @InProceedings{VB10.182,
  author = {Sowmya V. B. and Monojit Choudhury and Kalika Bali and Tirthankar Dasgupta and Anupam Basu},
  title = {Resource Creation for Training and Testing of Transliteration Systems for Indian Languages},
  booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA