Title

Information Extraction from Text Corpora: Using Filters on Collocation Sets

Authors

Gerhard Heyer (Leipzig University Computer Science Institute, NLP Dept. Augustusplatz 10/11 04109 Leipzig, Germany)

Uwe Quasthoff (Leipzig University Computer Science Institute, NLP Dept. Augustusplatz 10/11 04109 Leipzig, Germany)

Christian Wolff (Leipzig University Computer Science Institute, NLP Dept. Augustusplatz 10/11 04109 Leipzig, Germany)

Session

WP3: Tools & Components

Abstract

This paper describes the application of filtering techniques to collocation sets calculated for very large text corpora. Additional information like patterns, grammatical information, subject areas and numerical values associated with the collocations are used to identify collocations with given semantic structure. Various examples and different techniques for applying such filters are described. We also give several examples of practical applications for this type of information extraction.

Keywords

Information extraction

Full Paper

299.pdf