Sentiment analysis is a subjective and challenging task, and its complexity increases further when applied to Arabic given its morphological complexity and the widespread use of its unstandardized dialects. While many datasets have been released to train sentiment classifiers in Arabic, most of these dataset contain shallow annotation by marking the sentiment of the text unit, be it a word, a sentence or a document. In this paper, we present the Arabic Sentiment Twitter Dataset for the Levant (ArSenTD-LEV), which is enriched with additional annotations. First, we conducted a manual analysis of tweets from the region to identify the elements that most affect sentiment. Based on findings from this analysis, we annotated 4,000 tweets by specifying, for each tweet, its overall sentiment, the target to which the sentiment was expressed, and how the sentiment was expressed, and the topic being discussed. Experimental results confirm the importance of these annotations at improving the performance of a baseline sentiment classifier. They also confirm the gap that results from applying sentiment models to tweets from totally different domains. We believe that such a corpus will open doors for more advanced research on sentiment analysis, such as exploring cross-topic and cross-dialect solutions.
@InProceedings{BALY18.23, author = {Ramy Baly ,Ramy Baly ,Hazem Hajj ,Wassim El-Hajj and Khaled Shaban}, title = {ArSentD-LEV: A Multi-Topic Corpus for Target-based Sentiment Analysis in Arabic Levantine Tweets}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Hend Al-Khalifa and King Saud University and KSA
Walid Magdy and University of Edinburgh and UK
Kareem Darwish and Qatar Computing Research Institute and Qatar
Tamer Elsayed and Qatar University and Qatar}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-25-2}, language = {english} }