Title |
Automatically Augmenting Terminological Lexicons from Untagged Text |
Authors |
Demetriou George (Department of Computer Science, University of Sheffield, 211 Portobello Street, Sheffield S1 4DP, United Kingdom, G.Demetriou@dcs.shef.ac.uk) Gaizauskas Robert (Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK, R.Gaizauskas@dcs.shef.ac.uk) |
Keywords |
Bootstrapping Methods, Information Extraction, Terminology Lexicons |
Session |
Session TO1 - Terminology |
Full Paper |
320.ps, 320.pdf |
Abstract |
Lexical resources play a crucial role in language technology but lexical acquisition can often be a time-consuming, laborious and costly exercise. In this paper, we describe a method for the automatic acquisition of technical terminology from domain restricted texts without the need for sophisticated natural language processing tools, such as taggers or parsers, or text corpora annotated with labelled cases. The method is based on the idea of using prior or seed knowledge in order to discover co-occurrence patterns for the terms in the texts. A bootstrapping algorithm has been developed that identifies patterns and new terms in an iterative manner. Experiments with scientific journal abstracts in the biology domain indicate an accuracy rate for the extracted terms ranging from 58% to 71%. The new terms have been found useful for improving the coverage of a system used for terminology identification tasks in the biology domain. |