Summary of the paper

Title Approximating Learning Curves for Active-Learning-Driven Annotation
Authors Tomanek Katrin and Udo Hahn
Abstract Active learning (AL) is getting more and more popular as a methodology to considerably reduce the annotation effort when building training material for statistical learning methods for various NLP tasks. A crucial issue rarely addressed, however, is when to actually stop the annotation process to profit from the savings in efforts. This question is tightly related to estimating the classifier performance after a certain amount of data has already been annotated. While learning curves are the default means to monitor the progress of the annotation process in terms of classifier performance, this requires a labeled gold standard which - in realistic annotation settings, at least - is often unavailable. We here propose a method for committee-based AL to approximate the progression of the learning curve based on the disagreement among the committee members. This method relies on a separate, unlabeled corpus and is thus well suited for situations where a labeled gold standard is not available or would be too expensive to obtain. Considering named entity recognition as a test case we provide empirical evidence that this approach works well under simulation as well as under real-world annotation conditions.
Language Language-independent
Topics Corpus (creation, annotation, etc.), Acquisition, Machine Learning
Full paper Approximating Learning Curves for Active-Learning-Driven Annotation
Slides Approximating Learning Curves for Active-Learning-Driven Annotation
Bibtex @InProceedings{KATRIN08.335,
  author = {Tomanek Katrin and Udo Hahn},
  title = {Approximating Learning Curves for Active-Learning-Driven Annotation},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {http://www.lrec-conf.org/proceedings/lrec2008/},
  language = {english}
  }

Powered by ELDA © 2008 ELDA/ELRA