LREC 2000 2nd International Conference on Language Resources & Evaluation  
Home Basic Info Archaeological Zappeion Registration Conference

Conference Papers

Program
Papers
Sessions
Abstracts
Authors
Keywords
Search

Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts.


Previous Paper   Next Paper  

Title Part of Speech Tagging and Lemmatisation for the Spoken Dutch Corpus
Authors Van Eynde Frank (Center for Computational Linguistics, Maria-Theresiastraat 21, 3000 Leuven, Belgium, frank.vaneynde@ccl.kuleuven.ac.be)
Zavrel Jakub (CNTS / Language Technology Group, University of Antwerp, Universiteitsplein 1, 2610 Wilrijk, Belgium, zavrel@uia.ua.ac.be)
Daelemans Walter (CNTS / Language Technology Group, University of Antwerp, Universiteitsplein 1, 2610 Wilrijk, Belgium, daelem@uia.ua.ac.be)
Keywords Dutch, POS Tagging, Tagger Evaluation, Tagset Design
Session Session WO18 - Morphology in Lexical and Textual Resources
Abstract This paper describes the lemmatisation and tagging guidelines developed for the “Spoken Dutch Corpus”, and lays out the philosophy behind the high granularity tagset that was designed for the project. To bootstrap the annotation of large quantities of material (10 million words) with this new tagset we tested several existing taggers and tagger generators on initial samples of the corpus. The results show that the most effective method, when trained on the small samples, is a high quality implementation of a Hidden Markov Model tagger generator.

 

rdana">