LREC 2000 2nd International Conference on Language Resources & Evaluation | |
Conference Papers
Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377. |
Previous Paper Next Paper
Title | Something Borrowed, Something Blue: Rule-based Combination of POS Taggers |
Authors |
Borin Lars (Department of Linguistics, Uppsala University, Box 527, SE–751 20 Uppsala, SWEDEN, Lars.Borin@ling.uu.se) |
Keywords | Knowledge-Rich NLP, Machine Learning, Multilingual Corpora, Parallel Corpora, POS Tagging |
Session | Session WO1 - Corpus Tagging |
Abstract | Linguistically annotated text resources are still scarce for many languages and for many text types, mainly because their creation repre-sents a major investment of work and time. For this reason, it is worthwhile to investigate ways of reusing existing resources in novel ways. In this paper, we investigate how off-the-shelf part of speech (POS) taggers can be combined to better cope with text material of a type on which they were not trained, and for which there are no readily available training corpora. We indicate—using freely avail-able taggers for German (although the method we describe is not language-dependent)—how such taggers can be combined by using linguistically motivated rules so that the tagging accuracy of the combination exceeds that of the best of the individual taggers. |