LREC 2010 Proceedings

Summary of the paper

Title	English-Hindi Transliteration using Multiple Similarity Metrics
Authors	Niraj Aswani and Robert Gaizauskas
Abstract	In this paper, we present an approach to measure the transliteration similarity of English-Hindi word pairs. Our approach has two components. First we propose a bi-directional mapping between one or more characters in the Devanagari script and one or more characters in the Roman script (pronounced as in English). This allows a given Hindi word written in Devanagari to be transliterated into the Roman script and vice-versa. Second, we present an algorithm for computing a similarity measure that is a variant of Dice’s coefficient measure and the LCSR measure and which also takes into account the constraints needed to match English-Hindi transliterated words. Finally, by evaluating various similarity metrics individually and together under a multiple measure agreement scenario, we show that it is possible to achieve a 0.92 f-measure in identifying English-Hindi word pairs that are transliterations. In order to assess the portability of our approach to other similar languages we adapt our system to the Gujarati language.
Topics	Phonetic Databases, Phonology, Machine Translation, SpeechToSpeech Translation, Tools, systems, applications
Full paper	English-Hindi Transliteration using Multiple Similarity Metrics
Slides	-
Bibtex	@InProceedings{ASWANI10.694, author = {Niraj Aswani and Robert Gaizauskas}, title = {English-Hindi Transliteration using Multiple Similarity Metrics}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} }