SUMMARY : Session P6-WT

 

Title KNACK-2002: a Richly Annotated Corpus of Dutch Written Text
Authors V. Hoste, G. Pauw
Abstract In this paper, we introduce the annotated KNACK-2002 corpus of Dutch written text. The corpus features five different annotation layers, ranging from the annotation of morphological boundaries at the word level, over the annotation of part-of-speech tags and phrase chunks at the syntactic level to the annotation of named entities at the semantic level and coreferential relations at the discourse level. We believe the corpus is unique in the Dutch language area because of its richness of annotation layers, providing researchers with a useful gold standard data set for different NLP tasks in the domains of morphology, (morpho)syntax, semantics and discourse.
Keywords annotation, semi-automatic annotation, corpus construction, written Dutch
Full paper KNACK-2002: a Richly Annotated Corpus of Dutch Written Text