Title |
Something Borrowed, Something Blue: Rule-based Combination of POS Taggers |
Authors |
Borin Lars (Department of Linguistics, Uppsala University, Box 527, SE–751 20 Uppsala, SWEDEN, Lars.Borin@ling.uu.se) |
Keywords |
Knowledge-Rich NLP, Machine Learning, Multilingual Corpora, Parallel Corpora, POS Tagging |
Session |
Session WO1 - Corpus Tagging |
Full Paper |
158.ps, 158.pdf |
Abstract |
Linguistically annotated text resources are still scarce for many languages and for many text types, mainly because their creation repre-sents a major investment of work and time. For this reason, it is worthwhile to investigate ways of reusing existing resources in novel ways. In this paper, we investigate how off-the-shelf part of speech (POS) taggers can be combined to better cope with text material of a type on which they were not trained, and for which there are no readily available training corpora. We indicate—using freely avail-able taggers for German (although the method we describe is not language-dependent)—how such taggers can be combined by using linguistically motivated rules so that the tagging accuracy of the combination exceeds that of the best of the individual taggers. |