Title |
Using Similarity Metrics For Terminology Recognition |
Authors |
Jonathan Butters and Fabio Ciravegna |
Abstract |
In this paper we present an approach to terminology recognition whereby a sublanguage term (e.g. an aircraft engine component term extracted from a maintenance log) is matched to its corresponding term from a pre-defined list (such as a taxonomy representing the official break-down of the engine). Terminology recognition is addressed as a classification task whereby the extracted term is associated to one or more potential terms in the official description list via the application of string similarity metrics. The solution described in the paper uses dynamically computed similarity cut-off thresholds calculated on the basis of modeling a noise curve. Dissimilar string matches form a Gaussian distributed noise curve that can be identified and extracted leaving only mostly similar string matches. Dynamically calculated thresholds are preferable over fixed similarity thresholds as fixed thresholds are inherently imprecise, that is, there is no similarity boundary beyond which any two strings always describe the same concept. |
Language |
Single language |
Topics |
Statistical methods, Information Extraction, Information Retrieval, Tools, systems, applications |
Full paper |
Using Similarity Metrics For Terminology Recognition |
Slides |
Using Similarity Metrics For Terminology Recognition |
Bibtex |
@InProceedings{BUTTERS08.717,
author = {Jonathan Butters and Fabio Ciravegna},
title = {Using Similarity Metrics For Terminology Recognition},
booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
year = {2008},
month = {may},
date = {28-30},
address = {Marrakech, Morocco},
editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
publisher = {European Language Resources Association (ELRA)},
isbn = {2-9517408-4-0},
note = {http://www.lrec-conf.org/proceedings/lrec2008/},
language = {english}
} |