Title |
Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon |
Authors |
Mona Diab, Mohamed Albadrashiny, Maryam Aminian, Mohammed Attia, Heba Elfardy, Nizar Habash, Abdelati Hawwari, Wael Salloum, Pradeep Dasigi and Ramy Eskander |
Abstract |
We introduce an electronic three-way lexicon, Tharwa, comprising Dialectal Arabic, Modern Standard Arabic and English correspondents. The paper focuses on Egyptian Arabic as the first pilot dialect for the resource, with plans to expand to other dialects of Arabic in later phases of the project. We describe Tharwas creation process and report on its current status. The lexical entries are augmented with various elements of linguistic information such as POS, gender, rationality, number, and root and pattern information. The lexicon is based on a compilation of information from both monolingual and bilingual existing resources such as paper dictionaries and electronic, corpus-based dictionaries. Multiple levels of quality checks are performed on the output of each step in the creation process. The importance of this lexicon lies in the fact that it is the first resource of its kind bridging multiple variants of Arabic with English. Furthermore, it is a wide coverage lexical resource containing over 73,000 Egyptian entries. Tharwa is publicly available. We believe it will have a significant impact on both Theoretical Linguistics as well as Computational Linguistics research. |
Topics |
Multilinguality, Semantics |
Full paper |
Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon |
Bibtex |
@InProceedings{DIAB14.1161,
author = {Mona Diab and Mohamed Albadrashiny and Maryam Aminian and Mohammed Attia and Heba Elfardy and Nizar Habash and Abdelati Hawwari and Wael Salloum and Pradeep Dasigi and Ramy Eskander}, title = {Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |