Title |
Using large multi-purpose corpora for specific research questions: discourse phenomena related to wh-questions in the Spoken Dutch Corpus |
Author(s) |
Nelleke Oostdijk, Lou Boves Dept. of Language and Speech, University of Nijmegen, P.O. Box 9103, 6500HD Nijmegen, The Netherlands |
Session |
O27-ESW |
Abstract |
In this paper, we investigate whether a dataset derived from a multi-purpose corpus such as the Spoken Dutch Corpus may be considered appropriate for developing a taxonomy of wh-questions, and a model of the way in which these questions are integrated in spoken discourse. We compare the results obtained from the Spoken Dutch Corpus with a similar analysis of a large random collection of FAQs from the internet. We find substantial differences between the questions in spoken discourse and FAQs. Therefore, it may not be trivial to use a general purpose corpus as a starting point for developing models for human-computer interaction. |
Keyword(s) |
large corpora; discourse analysis; question-answering; Spoken Dutch Corpus |
Language(s) |
Dutch |
Full Paper |