Summary of the paper

Title Joint Segmentation and POS Tagging for Arabic Using a CRF-based Classifier
Authors Souhir Gahbiche-Braham, Hélène Bonneau-Maynard, Thomas Lavergne and François Yvon
Abstract Arabic is a morphologically rich language, and Arabic texts abound of complex word forms built by concatenation of multiple subparts, corresponding for instance to prepositions, articles, roots prefixes, or suffixes. The development of Arabic Natural Language Processing applications, such as Machine Translation (MT) tools, thus requires some kind of morphological analysis. In this paper, we compare various strategies for performing such preprocessing, using generic machine learning techniques. The resulting tool is compared with two open domain alternatives in the context of a statistical MT task and is shown to be faster than its competitors, with no significant difference in MT quality.
Topics Statistical and machine learning methods, Morphology, Part of speech tagging
Full paper Joint Segmentation and POS Tagging for Arabic Using a CRF-based Classifier
Bibtex @InProceedings{GAHBICHEBRAHAM12.855,
  author = {Souhir Gahbiche-Braham and Hélène Bonneau-Maynard and Thomas Lavergne and François Yvon},
  title = {Joint Segmentation and POS Tagging for Arabic Using a CRF-based Classifier},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {may},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {english}
 }
Powered by ELDA © 2012 ELDA/ELRA