Title |
Constructing a Broad-coverage Lexicon for Text Mining in the Patent Domain |
Authors |
Nelleke Oostdijk, Suzan Verberne and Cornelis Koster |
Abstract |
For mining intellectual property texts (patents), a broad-coverage lexicon that covers general English words together with terminology from the patent domain is indispensable. The patent domain is very diffuse as it comprises a variety of technical domains (e.g. Human Necessities, Chemistry & Metallurgy and Physics in the International Patent Classification). As a result, collecting a lexicon that covers the language used in patent texts is not a straightforward task. In this paper we describe the approach that we have developed for the semi-automatic construction of a broad-coverage lexicon for classification and information retrieval in the patent domain and which combines information from multiple sources. Our contribution is twofold. First, we provide insight into the difficulties of developing lexical resources for information retrieval and text mining in the patent domain, a research and development field that is expanding quickly. Second, we create a broad coverage lexicon annotated with rich lexical information and containing both general English word forms and domain terminology for various technical domains. |
Topics |
Lexicon, lexical database, MultiWord Expressions & Collocations, Morphology |
Full paper |
Constructing a Broad-coverage Lexicon for Text Mining in the Patent Domain |
Slides |
- |
Bibtex |
@InProceedings{OOSTDIJK10.378,
author = {Nelleke Oostdijk and Suzan Verberne and Cornelis Koster}, title = {Constructing a Broad-coverage Lexicon for Text Mining in the Patent Domain}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |