Title |
MPC: A Multi-Party Chat Corpus for Modeling Social Phenomena in Discourse |
Authors |
Samira Shaikh, Tomek Strzalkowski, Aaron Broadwell, Jennifer Stromer-Galley, Sarah Taylor and Nick Webb |
Abstract |
In this paper, we describe our experience with collecting and creating an annotated corpus of multi-party online conversations in a chat-room environment. This effort is part of a larger project to develop computational models of social phenomena such as agenda control, influence, and leadership in on-line interactions. Such models will help capturing the dialogue dynamics that are essential for developing, among others, realistic human-machine dialogue systems, including autonomous virtual chat agents. In this paper we describe data collection method used and the characteristics of the initial dataset of English chat. We have devised a multi-tiered collection process in which the subjects start from simple, free-flowing conversations and progress towards more complex and structured interactions. In this paper, we report on the first two stages of this process, which were recently completed. The third, large-scale collection effort is currently being conducted. All English dialogue has been annotated at four levels: communication links, dialogue acts, local topics and meso-topics. Some details of these annotations will be discussed later in this paper, although a full description is impossible within the scope of this article. |
Topics |
Corpus (creation, annotation, etc.), Discourse annotation, representation and processing, Acquisition |
Full paper |
MPC: A Multi-Party Chat Corpus for Modeling Social Phenomena in Discourse |
Slides |
- |
Bibtex |
@InProceedings{SHAIKH10.85,
author = {Samira Shaikh and Tomek Strzalkowski and Aaron Broadwell and Jennifer Stromer-Galley and Sarah Taylor and Nick Webb}, title = {MPC: A Multi-Party Chat Corpus for Modeling Social Phenomena in Discourse}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |