|
Summary of the paper
Title |
An End-to-End PDF Toolchain for Marking Up Scientific Documents |
Authors |
Sanna Hulkkonen, Oliver Ray |
Abstract |
This paper proposes a system for making sentence-level semantic enrichment of scientific publications more user-friendly by developing an end-to-end toolchain for augmenting PDFs with automatically-determined textual annotations and visual highlights. The aim is tocategorise each sentence according to a given classification scheme and display the labels in a visually appealing way that preserves document structure and formatting while allowing users to work with standard PDF tools they are already accustomed to. This is in contrast to existing approaches which provide an XML representation of document content obtained by abstracting away formatting andstructural details in order to focus on the raw text. In particular, we present a toolchain that automatically marks up each sentence inthe body of a PDF with a Core Scientific Concept category using a classifier trained with a corpus of papers on social insect biology that we manually labelled ourselves. Preliminary testing with domain experts provides anecdotal evidence that end-users do find such automatically derived sentence classifications useful and that they prefer to work directly with marked up PDFs. |
Full paper |
An End-to-End PDF Toolchain for Marking Up Scientific Documents
|
Bibtex |
@InProceedings{HULKKONEN18.15, author = {Sanna Hulkkonen and Oliver Ray}, title = {An End-to-End PDF Toolchain for Marking Up Scientific Documents}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-20-7}, language = {english} } |
|
|