Title

Title	Automatic Generation of Compound Word Lexicon for Hindi Speech Synthesis
Author(s)	Deepa S.R. (1), Kalika Bali (2), Partha Pratim Talukdar (2), A.G. Ramakrishnan (3) (1) Birla Institute of Technology & Science, Pilani, Rajasthan, India, f2000073@bits-pilani.ac.in; (2) Hewlett-Packard Labs, 24 Salarpuria Arena, Hosur Road, Bangalore, India, {kalika.bali, partha.talukdar}@hp.com; (3) Department of Electrical Engineering, Indian Institute of Science, Bangalore, India, ramkiag@ee.iisc.ernet.in
Session	P27-SE
Abstract	This paper addresses the problem of Hindi compound word splitting and its relevance to developing a good quality phonetizer for Hindi Speech Synthesis. The constituents of a Hindi compound word are not separated by space or hyphen. Hence, most of the existing compound splitting algorithms can not be applied to Hindi. We propose a new technique for automatic extraction of compound words from Hindi corpus. Preliminary tests conducted on the algorithm have shown a split rate of 92 to 96% of the input compound words. Of these splits, around 83 to 87% are correct splits. A few modifications have been suggested, which will improve the accuracy of the splits. Finally, we observe an improvement of 1.6% in Hindi Grapheme-to-Phoneme (G2P) conversion as a result of using a phonetized compound word lexicon, created by the above technique.
Keyword(s)	Compound word lexicon, speech synthesis, Hindi Grapheme-to-Phoneme (G2P) conversion, schwa deletion
Language(s)	Hindi
Full Paper	501.pdf