Arabic is written as a sequence of consonants and long vowels, with short vowels normally omitted. Diacritization attempts to recover short vowels and is an essential step for Text-to-Speech (TTS) systems. Though Automatic diacritization of Modern Standard Arabic (MSA) has received significant attention, limited research has been conducted on dialectal Arabic (DA) diacritization. Phonemic patterns of DA vary greatly from MSA and even from one another, which accounts for the noted difficulty with mutual intelligibility between dialects. With the recent advent of spoken dialog systems (or intelligent personal assistants), dialect vowel restoration is crucial to allow systems to speak back to the users in their own language variant. In this paper we present our research and benchmark results on the automatic diacritization of Tunisian and Moroccan using linear Conditional Random Fields.
@InProceedings{DARWISH18.20, author = {Kareem Darwish ,Ahmed Abdelali ,Hamdy Mubarak ,Younes Samih and Mohammed Attia}, title = {Diacritization of Arabic Dialects}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Hend Al-Khalifa and King Saud University and KSA
Walid Magdy and University of Edinburgh and UK
Kareem Darwish and Qatar Computing Research Institute and Qatar
Tamer Elsayed and Qatar University and Qatar}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-25-2}, language = {english} }