Title |
Momresp: A Bayesian Model for Multi-Annotator Document Labeling |
Authors |
Paul Felt, Robbie Haertel, Eric Ringger and Kevin Seppi |
Abstract |
Data annotation in modern practice often involves multiple, imperfect human annotators. Multiple annotations can be used to infer estimates of the ground-truth labels and to estimate individual annotator error characteristics (or reliability). We introduce MomResp, a model that incorporates information from both natural data clusters as well as annotations from multiple annotators to infer ground-truth labels and annotator reliability for the document classification task. We implement this model and show dramatic improvements over majority vote in situations where both annotations are scarce and annotation quality is low as well as in situations where annotators disagree consistently. Because MomResp predictions are subject to label switching, we introduce a solution that finds nearly optimal predicted class reassignments in a variety of settings using only information available to the model at inference time. Although MomResp does not perform well in annotation-rich situations, we show evidence suggesting how this shortcoming may be overcome in future work. |
Topics |
Crowdsourcing, Document Classification, Text categorisation |
Full paper |
Momresp: A Bayesian Model for Multi-Annotator Document Labeling |
Bibtex |
@InProceedings{FELT14.1153,
author = {Paul Felt and Robbie Haertel and Eric Ringger and Kevin Seppi}, title = {Momresp: A Bayesian Model for Multi-Annotator Document Labeling}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |