LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title Automatically Augmenting Terminological Lexicons from Untagged Text
Authors Demetriou George (Department of Computer Science, University of Sheffield, 211 Portobello Street, Sheffield S1 4DP, United Kingdom, G.Demetriou@dcs.shef.ac.uk)
Gaizauskas Robert (Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK, R.Gaizauskas@dcs.shef.ac.uk)
Keywords Bootstrapping Methods, Information Extraction, Terminology Lexicons
Session Session TO1 - Terminology
Full Paper 320.ps, 320.pdf
Abstract Lexical resources play a crucial role in language technology but lexical acquisition can often be a time-consuming, laborious and costly exercise. In this paper, we describe a method for the automatic acquisition of technical terminology from domain restricted texts without the need for sophisticated natural language processing tools, such as taggers or parsers, or text corpora annotated with labelled cases. The method is based on the idea of using prior or seed knowledge in order to discover co-occurrence patterns for the terms in the texts. A bootstrapping algorithm has been developed that identifies patterns and new terms in an iterative manner. Experiments with scientific journal abstracts in the biology domain indicate an accuracy rate for the extracted terms ranging from 58% to 71%. The new terms have been found useful for improving the coverage of a system used for terminology identification tasks in the biology domain.