Title

Using large multi-purpose corpora for specific research questions: discourse phenomena related to wh-questions in the Spoken Dutch Corpus

Author(s)

Nelleke Oostdijk, Lou Boves

Dept. of Language and Speech, University of Nijmegen, P.O. Box 9103, 6500HD Nijmegen, The Netherlands

Session

O27-ESW

Abstract

In this paper, we investigate whether a dataset derived from a multi-purpose corpus such as the Spoken Dutch Corpus may be considered appropriate for developing a taxonomy of wh-questions, and a model of the way in which these questions are integrated in spoken discourse. We compare the results obtained from the Spoken Dutch Corpus with a similar analysis of a large random collection of FAQs from the internet. We find substantial differences between the questions in spoken discourse and FAQs. Therefore, it may not be trivial to use a general purpose corpus as a starting point for developing models for human-computer interaction.

Keyword(s)

large corpora; discourse analysis; question-answering; Spoken Dutch Corpus

Language(s)

Dutch

Full Paper

449.pdf