Title |
Linguistic Corpus Search |
Author(s) |
Christian Biemann (1), Uwe Quasthoff (1), Christian Wolff (2) (1) Leipzig University, Computer Science Institute, Natural Language Processing Dept., Augustusplatz 10/11, 04109 Leipzig, Germany. (2) Regensburg University, Institute for Media, Information and Cultural Studies, Media Computing Dept., Universitätsstr. 31, 93040 Regensburg, Germany |
Session |
P1-W |
Abstract |
Searching corpora with linguistic questions requires both additional information encoded in the corpus and efficiency as in “traditional” search engines. We describe a search engine-like approach to querying plain as well as part-of-speech-tagged monolingual corpora. This approach makes use of a ‘minimalist’ query language which nevertheless allows powerful searches by optionally ignoring positional as well as inflectional features in the corpus sentences. Many queries can be formulated without detailed training via a simple web-based front-end. Relevant applications of this search tool in knowledge extraction are discussed as well. |
Keyword(s) |
Search, Indexing, linguistic constructions, large corpora |
Language(s) | German, English, language-independent |
Full Paper |