Summary of the paper

Title Annotation Time Stamps ― Temporal Metadata from the Linguistic Annotation Process
Authors Katrin Tomanek and Udo Hahn
Abstract We describe the re-annotation of selected types of named entities (persons, organizations, locations) from the Muc7 corpus. The focus of this annotation initiative is on recording the time needed for the linguistic process of named entity annotation. Annotation times are measured on two basic annotation units -- sentences vs. complex noun phrases. We gathered evidence that decision times are non-uniformly distributed over the annotation units, while they do not substantially deviate among annotators. This data seems to support the hypothesis that annotation times very much depend on the inherent ""hardness"" of each single annotation decision. We further show how such time-stamped information can be used for empirically grounded studies of selective sampling techniques, such as Active Learning. We directly compare Active Learning costs on the basis of token-based vs. time-based measurements. The data reveals that Active Learning keeps its competitive advantage over random sampling in both scenarios though the difference is less marked for the time metric than for the token metric.
Topics Metadata, Corpus (creation, annotation, etc.), Information Extraction, Information Retrieval
Full paper Annotation Time Stamps ― Temporal Metadata from the Linguistic Annotation Process
Slides Annotation Time Stamps ― Temporal Metadata from the Linguistic Annotation Process
Bibtex @InProceedings{TOMANEK10.652,
  author = {Katrin Tomanek and Udo Hahn},
  title = {Annotation Time Stamps ― Temporal Metadata from the Linguistic Annotation Process},
  booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)},
  year = {2010},
  month = {may},
  date = {19-21},
  address = {Valletta, Malta},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-6-7},
  language = {english}
 }
Powered by ELDA © 2010 ELDA/ELRA