Title |
Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects |
Authors |
David Graff and Mohamed Maamouri |
Abstract |
The Linguistic Data Consortium and Georgetown University Press are collaborating to create updated editions of bilingual diction- aries that had originally been published in the 1960's for English-speaking learners of Moroccan, Syrian and Iraqi Arabic. In their first editions, these dictionaries used ad hoc Latin-alphabet orthography for each colloquial Arabic dialect, but adopted some proper- ties of Arabic-based writing (collation order of Arabic headwords, clitic attachment to word forms in example phrases); despite their common features, there are notable differences among the three books that impede comparisons across the dialects, as well as com- parisons of each dialect to Modern Standard Arabic. In updating these volumes, we use both Arabic script and International Pho- netic Alphabet orthographies; the former provides a common basis for word recognition across dialects, while the latter provides dialect-specific pronunciations. Our goal is to preserve the full content of the original publications, supplement the Arabic headword inventory with new usages, and produce a uniform lexicon structure expressible via the Lexical Markup Framework (LMF, ISO 24613). To this end, we developed a relational database schema that applies consistently to each dialect, and HTTP-based tools for searching, editing, workflow, review and inventory management. |
Topics |
Lexicon, lexical database, LR Infrastructures and Architectures, Tools, systems, applications |
Full paper |
Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects |
Bibtex |
@InProceedings{GRAFF12.461,
author = {David Graff and Mohamed Maamouri}, title = {Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |