Summary of the paper

Title A Corpus of Scientific Biomedical Texts Spanning over 168 Years Annotated for Uncertainty
Authors Ramona Bongelli, Carla Canestrari, Ilaria Riccioni, Andrzej Zuczkowski, Cinzia Buldorini, Ricardo Pietrobon, Alberto Lavelli and Bernardo Magnini
Abstract Uncertainty language permeates biomedical research and is fundamental for the computer interpretation of unstructured text. And yet, a coherent, cognitive-based theory to interpret Uncertainty language and guide Natural Language Processing is, to our knowledge, non-existing. The aim of our project was therefore to detect and annotate Uncertainty markers ― which play a significant role in building knowledge or beliefs in readers' minds ― in a biomedical research corpus. Our corpus includes 80 manually annotated articles from the British Medical Journal randomly sampled from a 168-year period. Uncertainty markers have been classified according to a theoretical framework based on a combined linguistic and cognitive theory. The corpus was manually annotated according to such principles. We performed preliminary experiments to assess the manually annotated corpus and establish a baseline for the automatic detection of Uncertainty markers. The results of the experiments show that most of the Uncertainty markers can be recognized with good accuracy.
Topics Corpus (creation, annotation, etc.), Information Extraction, Information Retrieval, Lexicon, lexical database
Full paper A Corpus of Scientific Biomedical Texts Spanning over 168 Years Annotated for Uncertainty
Bibtex @InProceedings{BONGELLI12.823,
  author = {Ramona Bongelli and Carla Canestrari and Ilaria Riccioni and Andrzej Zuczkowski and Cinzia Buldorini and Ricardo Pietrobon and Alberto Lavelli and Bernardo Magnini},
  title = {A Corpus of Scientific Biomedical Texts Spanning over 168 Years Annotated for Uncertainty},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {may},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {english}
 }
Powered by ELDA © 2012 ELDA/ELRA