Summary of the paper

Title Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign
Authors Olivier Galibert, Sophie Rosset, Cyril Grouin, Pierre Zweigenbaum and Ludovic Quintard
Abstract Within the framework of the Quaero project, we proposed a new definition of named entities, based upon an extension of the coverage of named entities as well as the structure of those named entities. In this new definition, the extended named entities we proposed are both hierarchical and compositional. In this paper, we focused on the annotation of a corpus composed of press archives, OCRed from French newspapers of December 1890. We present the methodology we used to produce the corpus and the characteristics of the corpus in terms of named entities annotation. This annotated corpus has been used in an evaluation campaign. We present this evaluation, the metrics we used and the results obtained by the participants.
Topics Corpus (creation, annotation, etc.), Evaluation methodologies, Named Entity recognition
Full paper Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign
Bibtex @InProceedings{GALIBERT12.343,
  author = {Olivier Galibert and Sophie Rosset and Cyril Grouin and Pierre Zweigenbaum and Ludovic Quintard},
  title = {Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {may},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {english}
 }
Powered by ELDA © 2012 ELDA/ELRA