Title

RevisionBank: A Resource for Revision-based Multi-document Summarization and Evaluation

Author(s)

Jahna Otterbacher, Dragomir Radev

University of Michigan

Session

O14-W

Abstract

Multi-document summaries produced via sentence extraction often suffer from a number of cohesion problems, including dangling anaphora, sudden shifts in topic and incorrect or awkward chronological ordering. Therefore, the development of an automated revision process to correct such problems is a research area of current interest. We present the RevisionBank, a corpus of 240 extractive, multi-document summaries that have been manually revised to promote cohesion. The summaries were revised by six linguistic students using a constrained set of revision operations that we previously developed. In the current paper, we describe the process of developing a taxonomy of cohesion problems and corrective revision operators that address such problems, as well as an annotation schema for our corpus. Finally, we discuss how our taxonomy and corpus can be used for the study of revision-based multi-document summarization as well as for summary evaluation.

Keyword(s)

multi-document summarization, text cohesion, manual annotation, discourse

Language(s)

English

Full Paper

409.pdf