SUMMARY : Session P6-WT
Title | KNACK-2002: a Richly Annotated Corpus of Dutch Written Text |
---|---|
Authors | V. Hoste, G. Pauw |
Abstract | In this paper, we introduce the annotated KNACK-2002 corpus of Dutch written text. The corpus features five different annotation layers, ranging from the annotation of morphological boundaries at the word level, over the annotation of part-of-speech tags and phrase chunks at the syntactic level to the annotation of named entities at the semantic level and coreferential relations at the discourse level. We believe the corpus is unique in the Dutch language area because of its richness of annotation layers, providing researchers with a useful gold standard data set for different NLP tasks in the domains of morphology, (morpho)syntax, semantics and discourse. |
Keywords | annotation, semi-automatic annotation, corpus construction, written Dutch |
Full paper | KNACK-2002: a Richly Annotated Corpus of Dutch Written Text |