Summary of the paper

Title Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages
Authors Soumil Mandal, Sainik Kumar Mahata and Dipankar Das
Abstract Analysis of informative contents and sentiments of social users has been attempted quite intensively in the recent past. Most of the systems are usable only for monolingual data and fails or gives poor results when used on data with code-mixing property. To gather attention and encourage researchers to work on this crisis, we prepared gold standard Bengali-English code-mixed data with language and polarity tag for sentiment analysis purposes. In this paper, we discuss the systems we prepared to collect and filter raw Twitter data. In order to reduce manual work while annotation, hybrid systems combining rule based and supervised models were developed for both language and sentiment tagging. The final corpus was annotated by a group of annotators following a few guidelines. The gold standard corpus thus obtained has impressive inter-annotator agreement obtained in terms of Kappa values. Various metrics like Code-Mixed Index (CMI), Code-Mixed Factor (CF) along with various aspects (language and emotion) also qualitatively polled the code-mixed and sentiment properties of the corpus.
Full paper Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages
Bibtex @InProceedings{MANDAL18.27,
  author = {Soumil Mandal ,Sainik Kumar Mahata and Dipankar Das},
  title = {Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Kiyoaki Shirai},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-24-5},
  language = {english}
  }
Powered by ELDA © 2018 ELDA/ELRA