Title |
An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis |
Authors |
Eshrag Refaee and Verena Rieser |
Abstract |
We present a newly collected data set of 8,868 gold-standard annotated Arabic feeds. The corpus is manually labelled for subjectivity and sentiment analysis (SSA) ( = 0:816). In addition, the corpus is annotated with a variety of motivated feature-sets that have previously shown positive impact on performance. The paper highlights issues posed by twitter as a genre, such as mixture of language varieties and topic-shifts. Our next step is to extend the current corpus, using online semi-supervised learning. A first sub-corpus will be released via the ELRA repository as part of this submission. |
Topics |
Social Media Processing, Opinion Mining / Sentiment Analysis |
Full paper |
An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis |
Bibtex |
@InProceedings{REFAEE14.317,
author = {Eshrag Refaee and Verena Rieser}, title = {An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |