Summary of the paper

Title A Large Scale Comprehensive Lexical Inventory for Modern Standard Arabic
Authors Sawsan Alqahtani, Mona Diab and Wajdi Zaghouani
Abstract This paper introduces a lexical resource for Modern Standard Arabic (MSA) that explicitly lists ambiguity at the lexical and syntax level for each token. Arabic orthography is known for being underspecified for short vowels and other markers such as letter doubling and glottal stops, known as diacritics. This leads to further ambiguity in orthography with real impact on natural language processing applications not to mention readability and human language processing. We specifically target listing alternative ambiguous forms of words within and across the same part of speech (POS), namely where an undiacritized token (i.e. tokens with no specified diacritics) may have multiple possible diacritized alternatives. The entries in this dictionary are constrained to five POS tags: verbs, nouns, adjectives, adverbs, and, prepositions. A morphological analyzer and disambiguator is leveraged to generate the desired linguistic properties listed in our resource. The resulting inventory is a large scale comprehensive inventory of words recording their degree of ambiguity at various levels and example usages. This resource could be most useful for various NLP applications as well as for pedagogical applications and socio- and psycho-linguistic studies.
Full paper A Large Scale Comprehensive Lexical Inventory for Modern Standard Arabic
Bibtex @InProceedings{ALQAHTANI18.10,
  author = {Sawsan Alqahtani ,Mona Diab and Wajdi Zaghouani},
  title = {A Large Scale Comprehensive Lexical Inventory for Modern Standard Arabic},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Hend Al-Khalifa and King Saud University and KSA Walid Magdy and University of Edinburgh and UK Kareem Darwish and Qatar Computing Research Institute and Qatar Tamer Elsayed and Qatar University and Qatar},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-25-2},
  language = {english}
  }
Powered by ELDA © 2018 ELDA/ELRA