Title

Syntactic Analysis in the Spoken Dutch Corpus (CGN)

Authors

Ton van der Wouden (Utrecht University, Uil-OTS, Trans 10, 3512 JK Utrecht)

Heleen Hoekstra (Utrecht University, Uil-OTS, Trans 10, 3512 JK Utrecht)

Michael Moortgat (Utrecht University, Uil-OTS, Trans 10, 3512 JK Utrecht)

Bram Renmans (University of Leuven, Center for Computational Linguistics, Maria-Theresiastraat 21, 3000 Leuven, Belgium)

Ineke Schuurman (University of Leuven, Center for Computational Linguistics, Maria-Theresiastraat 21, 3000 Leuven, Belgium)

Session

SO4: Annotation Tools For Speech LRs

Abstract

The paper describes the syntactic annotation of the Spoken Dutch Corpus ("Corpus Gesproken Nederlands" or CGN), the Dutch-Flemish project (1998-2003) aiming at the collection, description and annotation of ten million words of spoken Dutch. In the first part, the background of the parsing strategy is discussed, as well as some details concerning the actual implementation of the parsing process. The second part discusses some examples of practical applications of the result of the parsing process.

Keywords

Syntactic analysis

Full Paper

71.pdf