Resource-poor languages, like Odia, inherently lack the necessary resources and tools for the task of sentiment analysis to give promising results. With more user-generated raw data readily available today, it is of prime importance to have annotated corpora from various domains. This paper is a first attempt towards building an annotated corpus of Odia poetry with sentiment labels. Our annotation scheme consists of usage of a polarity identification questionnaire clubbed with taxonomy of emotions. The annotated corpus is further used to build baseline sentiment classification models using machine learning techniques. Stylistic variations and structural differences between poetic and non-poetic texts make the task of sentiment classification challenging for the former. Using the annotated corpus of poems, we obtained comparable accuracy across various classification models. Linear-SVM outperformed other classifiers with an F1-Score of 0.734. The annotated corpus contains a total of 730 Odia Poems of various genres with a vocabulary of more than 23k words. Fleiss Kappa score of 0.83 was obtained which corresponds to near perfect agreement among the annotators.
@InProceedings{MOHANTY18.15, author = {Gaurav Mohanty ,Pruthwik Mishra and Radhika Mamidi}, title = {Kabithaa: An Annotated Corpus of Odia Poems with Sentiment Polarity Information}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Girish Nath Jha and Kalika Bali and Sobha L and Atul
Kr. Ojha}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-09-2}, language = {english} }