Title |
Hybrid Citation Extraction from Patents |
Authors |
Olivier Galibert, Sophie Rosset, Xavier Tannier and Fanny Grandry |
Abstract |
The Quaero project organized a set of evaluations of Named Entity recognition systems in 2009. One of the sub-tasks consists in extracting citations from patents, i.e. references to other documents, either other patents or general literature from English-language patents. We present in this paper the participation of LIMSI in this evaluation, with a complete system description and the evaluation results. The corpus shown that patent and non-patent citations have a very different nature. We then separated references to other patents and to general literature papers and we created a hybrid system. For patent citations, the system used rule-based expert knowledge on the form of regular expressions. The system for detecting non-patent citations, on the other hand, is purely stochastic (machine learning with CRF++). Then we mixed both approaches to provide a single output. 4 teams participated to this task and our system obtained the best results of this evaluation campaign, even if the difference between the first two systems is poorly significant. |
Topics |
Named Entity recognition, Information Extraction, Information Retrieval, Tools, systems, applications |
Full paper |
Hybrid Citation Extraction from Patents |
Slides |
- |
Bibtex |
@InProceedings{GALIBERT10.81,
author = {Olivier Galibert and Sophie Rosset and Xavier Tannier and Fanny Grandry}, title = {Hybrid Citation Extraction from Patents}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |