Title |
English-Hindi Transliteration using Multiple Similarity Metrics |
Authors |
Niraj Aswani and Robert Gaizauskas |
Abstract |
In this paper, we present an approach to measure the transliteration similarity of English-Hindi word pairs. Our approach has two components. First we propose a bi-directional mapping between one or more characters in the Devanagari script and one or more characters in the Roman script (pronounced as in English). This allows a given Hindi word written in Devanagari to be transliterated into the Roman script and vice-versa. Second, we present an algorithm for computing a similarity measure that is a variant of Dices coefficient measure and the LCSR measure and which also takes into account the constraints needed to match English-Hindi transliterated words. Finally, by evaluating various similarity metrics individually and together under a multiple measure agreement scenario, we show that it is possible to achieve a 0.92 f-measure in identifying English-Hindi word pairs that are transliterations. In order to assess the portability of our approach to other similar languages we adapt our system to the Gujarati language. |
Topics |
Phonetic Databases, Phonology, Machine Translation, SpeechToSpeech Translation, Tools, systems, applications |
Full paper |
English-Hindi Transliteration using Multiple Similarity Metrics |
Slides |
- |
Bibtex |
@InProceedings{ASWANI10.694,
author = {Niraj Aswani and Robert Gaizauskas}, title = {English-Hindi Transliteration using Multiple Similarity Metrics}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |