Title |
Information Extraction from Text Corpora: Using Filters on Collocation Sets |
Authors |
Gerhard Heyer (Leipzig University Computer Science Institute, NLP Dept. Augustusplatz 10/11 04109 Leipzig, Germany) Uwe Quasthoff (Leipzig University Computer Science Institute, NLP Dept. Augustusplatz 10/11 04109 Leipzig, Germany) Christian Wolff (Leipzig University Computer Science Institute, NLP Dept. Augustusplatz 10/11 04109 Leipzig, Germany) |
Session |
WP3: Tools & Components |
Abstract |
This paper describes the application of filtering techniques to collocation sets calculated for very large text corpora. Additional information like patterns, grammatical information, subject areas and numerical values associated with the collocations are used to identify collocations with given semantic structure. Various examples and different techniques for applying such filters are described. We also give several examples of practical applications for this type of information extraction. |
Keywords |
Information extraction |
Full Paper |