Title |
Annotation of Specialized Corpora using a Comprehensive Entity and Relation Scheme |
Authors |
Louise Deleger, Anne-Laure Ligozat, Cyril Grouin, Pierre Zweigenbaum and Aurelie Neveol |
Abstract |
Annotated corpora are essential resources for many applications in Natural Language Processing. They provide insight on the linguistic and semantic characteristics of the genre and domain covered, and can be used for the training and evaluation of automatic tools. In the biomedical domain, annotated corpora of English texts have become available for several genres and subfields. However, very few similar resources are available for languages other than English. In this paper we present an effort to produce a high-quality corpus of clinical documents in French, annotated with a comprehensive scheme of entities and relations. We present the annotation scheme as well as the results of a pilot annotation study covering 35 clinical documents in a variety of subfields and genres. We show that high inter-annotator agreement can be achieved using a complex annotation scheme. |
Topics |
Information Extraction, Information Retrieval, Named Entity Recognition |
Full paper |
Annotation of Specialized Corpora using a Comprehensive Entity and Relation Scheme |
Bibtex |
@InProceedings{DELEGER14.552,
author = {Louise Deleger and Anne-Laure Ligozat and Cyril Grouin and Pierre Zweigenbaum and Aurelie Neveol}, title = {Annotation of Specialized Corpora using a Comprehensive Entity and Relation Scheme}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |