Title |
A Corpus of Scientific Biomedical Texts Spanning over 168 Years Annotated for Uncertainty |
Authors |
Ramona Bongelli, Carla Canestrari, Ilaria Riccioni, Andrzej Zuczkowski, Cinzia Buldorini, Ricardo Pietrobon, Alberto Lavelli and Bernardo Magnini |
Abstract |
Uncertainty language permeates biomedical research and is fundamental for the computer interpretation of unstructured text. And yet, a coherent, cognitive-based theory to interpret Uncertainty language and guide Natural Language Processing is, to our knowledge, non-existing. The aim of our project was therefore to detect and annotate Uncertainty markers ― which play a significant role in building knowledge or beliefs in readers' minds ― in a biomedical research corpus. Our corpus includes 80 manually annotated articles from the British Medical Journal randomly sampled from a 168-year period. Uncertainty markers have been classified according to a theoretical framework based on a combined linguistic and cognitive theory. The corpus was manually annotated according to such principles. We performed preliminary experiments to assess the manually annotated corpus and establish a baseline for the automatic detection of Uncertainty markers. The results of the experiments show that most of the Uncertainty markers can be recognized with good accuracy. |
Topics |
Corpus (creation, annotation, etc.), Information Extraction, Information Retrieval, Lexicon, lexical database |
Full paper |
A Corpus of Scientific Biomedical Texts Spanning over 168 Years Annotated for Uncertainty |
Bibtex |
@InProceedings{BONGELLI12.823,
author = {Ramona Bongelli and Carla Canestrari and Ilaria Riccioni and Andrzej Zuczkowski and Cinzia Buldorini and Ricardo Pietrobon and Alberto Lavelli and Bernardo Magnini}, title = {A Corpus of Scientific Biomedical Texts Spanning over 168 Years Annotated for Uncertainty}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |