Title

Linguistic Corpus Search

Author(s)

Christian Biemann (1), Uwe Quasthoff (1), Christian Wolff (2)

(1) Leipzig University, Computer Science Institute, Natural Language Processing Dept., Augustusplatz 10/11, 04109 Leipzig, Germany. (2) Regensburg University, Institute for Media, Information and Cultural Studies, Media Computing Dept., Universitätsstr. 31, 93040 Regensburg, Germany

Session

P1-W

Abstract

Searching corpora with linguistic questions requires both additional information encoded in the corpus and efficiency as in “traditional” search engines. We describe a search engine-like approach to querying plain as well as part-of-speech-tagged monolingual corpora. This approach makes use of a ‘minimalist’ query language which nevertheless allows powerful searches by optionally ignoring positional as well as inflectional features in the corpus sentences. Many queries can be formulated without detailed training via a simple web-based front-end. Relevant applications of this search tool in knowledge extraction are discussed as well.

Keyword(s)

Search, Indexing, linguistic constructions, large corpora

Language(s) German, English, language-independent
Full Paper

546.pdf