Summary of the paper

Title Constructing and Using Broad-coverage Lexical Resource for Enhancing Morphological Analysis of Arabic
Authors Majdi Sawalha and Eric Atwell
Abstract Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy and the performance of NLP applications. We are constructing a broad-coverage lexical resource to improve the accuracy of morphological analyzers and part-of-speech taggers of Arabic text. Over the past 1200 years, many different kinds of Arabic language lexicons were constructed; these lexicons are different in ordering, size and aim or goal of construction. We collected 23 machine-readable lexicons, which are freely available on the web. We combined lexical resources into one large broad-coverage lexical resource by extracting information from disparate formats and merging traditional Arabic lexicons. To evaluate the broad-coverage lexical resource we computed coverage over the Qur’an, the Corpus of Contemporary Arabic, and a sample from the Arabic Web Corpus, using two methods. Counting exact word matches between test corpora and lexicon scored about 65-68%; Arabic has a rich morphology with many combinations of roots, affixes and clitics, so about a third of words in the corpora did not have an exact match in the lexicon. The second approach is to compute coverage in terms of use in a lemmatizer program, which strips clitics to look for a match for the underlying lexeme; this scored about 82-85%.
Topics Lexicon, lexical database, Morphology, Evaluation methodologies
Full paper Constructing and Using Broad-coverage Lexical Resource for Enhancing Morphological Analysis of Arabic
Slides -
Bibtex @InProceedings{SAWALHA10.287,
  author = {Majdi Sawalha and Eric Atwell},
  title = {Constructing and Using Broad-coverage Lexical Resource for Enhancing Morphological Analysis of Arabic},
  booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA