Summary of the paper

Title A Taxonomy for In-depth Evaluation of Normalization for User Generated Content
Authors Rob Van der Goot, Rik Van Noord and Gertjan Van Noord
Abstract In this work we present a taxonomy of error categories for lexical normalization, which is the task of translating user generated content to canonical language. We annotate a recent normalization dataset to test the practical use of the taxonomy and read a near-perfect agreement. This annotated dataset is then used to evaluate how an existing normalization model performs on the different categories of the taxonomy. The results of this evaluation reveal that some of the problematic categories only include minor transformations, whereas most regular transformations are solved quite well.
Topics Social Media Processing, Evaluation Methodologies, Corpus (Creation, Annotation, Etc.)
Full paper A Taxonomy for In-depth Evaluation of Normalization for User Generated Content
Bibtex @InProceedings{VAN DER GOOT18.306,
  author = {Rob Van der Goot and Rik Van Noord and Gertjan Van Noord},
  title = "{A Taxonomy for In-depth Evaluation of Normalization for User Generated Content}",
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {May 7-12, 2018},
  address = {Miyazaki, Japan},
  editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {979-10-95546-00-9},
  language = {english}
  }
Powered by ELDA © 2018 ELDA/ELRA