Title

Title	A Search Tool for Corpora with Positional Tagsets and Ambiguities
Author(s)	Adam Przepiórkowski (1); Zygmunt Krynicki (2); Łukasz Dębowski (1); Marcin Woliński (1); Daniel Janus (3); Piotr Bański (4) (1) Polish Academy of Sciences, Institute of Computer Science, ul.~Ordona 21, 01-237 Warsaw, Poland - {adamp, ldebowsk, wolinski}@ipipan.waw.pl; (2) Polish-Japanese Institute of Information Technology, ul.~Koszykowa 86, 02-008 Warsaw, Poland - zygmunt.krynicki@pjwstk.edu.pl; (3) University of Warsaw, Institute of Computer Science, ul.~Banacha 2, 02-097 Warsaw, Poland, nathell@bach.ipipan.waw.pl; (4) University of Warsaw, Institute of English, ul.~Nowy ¦wiat 4, 00-497 Warsaw, Poland, bansp@ipipan.waw.pl,
Session	P14-W
Abstract	This article describes POLIQARP, a corpus indexing and query tool, which understands positional tagsets and which does not assume that word forms are annotated with unique morphosyntactic tags. POLIQARP is designed to be applicable to a variety of languages and tagsets: it works with XML-encoded texts, uses the UTF-8 character set, and allows for an external specification of the tagset. Currently, POLIQARP is used for indexing and searching a morphosyntactically annotated corpus of Polish.
Keyword(s)	corpus, positional tagset, ambiguity, concordancer, XCES, POS, part-of-speech, CQP
Language(s)	Polish (but the tool is not language-specific)
Full Paper	275.pdf