Summary of the paper

Title Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data
Authors Mona Diab, Mahmoud Ghoneim, Abdelati Hawwari, Fahad AlGhamdi, Nada AlMarwani and Mohamed Al-Badrashiny
Abstract We present our effort to create a large Multi-Layered representational repository of Linguistic Code-Switched Arabic data. The process involves developing clear annotation standards and Guidelines, streamlining the annotation process, and implementing quality control measures. We used two main protocols for annotation: in-lab gold annotations and crowd sourcing annotations. We developed a web-based annotation tool to facilitate the management of the annotation process. The current version of the repository contains a total of 886,252 tokens that are tagged into one of sixteen code-switching tags. The data exhibits code switching between Modern Standard Arabic and Egyptian Dialectal Arabic representing three data genres: Tweets, commentaries, and discussion fora. The overall Inter-Annotator Agreement is 93.1%.
Topics Corpus (Creation, Annotation, etc.), LR Infrastructures and Architectures, Standards for LRs
Full paper Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data
Bibtex @InProceedings{DIAB16.1161,
  author = {Mona Diab and Mahmoud Ghoneim and Abdelati Hawwari and Fahad AlGhamdi and Nada AlMarwani and Mohamed Al-Badrashiny},
  title = {Creating a Large Multi-Layered Representational Repository of Linguistic Code Switched Arabic Data},
  booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
  year = {2016},
  month = {may},
  date = {23-28},
  location = {Portorož, Slovenia},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Sara Goggi and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Helene Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {978-2-9517408-9-1},
  language = {english}
 }
Powered by ELDA © 2016 ELDA/ELRA