Title |
The ConceptMapper Approach to Named Entity Recognition |
Authors |
Michael Tanenblatt, Anni Coden and Igor Sominsky |
Abstract |
ConceptMapper is an open source tool we created for classifying mentions in an unstructured text document based on concept terminologies (dictionaries) and yielding named entities as output. It is implemented as a UIMA (Unstructured Information Management Architecture) annotator and is highly configurable: concepts can come from standardised or proprietary terminologies; arbitrary attributes can be associated with dictionary entries, and those attributes can then be associated with the named entities in the output; numerous search strategies and search options can be specified; any tokenizer packaged as a UIMA annotator can be used to tokenize the dictionary, so the same tokenization can be guaranteed for the input and dictionary, minimising tokenization mismatch errors; and the types and features of UIMA annotations used as input and generated as output can also be controlled. We describe ConceptMapper and its configuration parameters and their trade-offs, then describe the results of an experiment wherein some of these parameters are varied and precision and recall are subsequently measured in the task of in identifying concepts in a collection English-language clinical reports (colon cancer pathology). ConceptMapper is available from the Apache UIMA Sandbox, covered by the Apache Open Source license. |
Topics |
Named Entity recognition, Tools, systems, applications, Lexicon, lexical database |
Full paper |
The ConceptMapper Approach to Named Entity Recognition |
Slides |
- |
Bibtex |
@InProceedings{TANENBLATT10.448,
author = {Michael Tanenblatt and Anni Coden and Igor Sominsky}, title = {The ConceptMapper Approach to Named Entity Recognition}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |