| Title | Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction | 
  
  | Authors | Tilia Ellendorff, Fabio Rinaldi and Simon Clematide | 
  
  | Abstract | We show how to use large biomedical databases in order to obtain a gold standard for training a machine learning system over a corpus of biomedical text. As an example we use the Comparative Toxicogenomics Database (CTD) and describe by means of a short case study how the obtained data can be applied. We explain how we exploit the structure of the database for compiling training material and a testset. Using a Naive Bayes document classification approach based on words, stem bigrams and MeSH descriptors we achieve a macro-average F-score of 61% on a subset of 8 action terms. This outperforms a baseline system based on a lookup of stemmed keywords by more than 20%. Furthermore, we present directions of future work, taking the described system as a vantage point. Future work will be aiming towards a weakly supervised system capable of discovering complete biomedical interactions and events. | 
  
  | Topics | Text Mining, Metadata | 
  
  | Full paper  | Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction | 
  
  | Bibtex | @InProceedings{ELLENDORFF14.1156, author =  {Tilia Ellendorff and Fabio Rinaldi and Simon Clematide},
 title =  {Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction},
 booktitle =  {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
 year =  {2014},
 month =  {may},
 date =  {26-31},
 address =  {Reykjavik, Iceland},
 editor =  {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
 publisher =  {European Language Resources Association (ELRA)},
 isbn =  {978-2-9517408-8-4},
 language =  {english}
 }
 |