Title |
C-3: Coherence and Coreference Corpus |
Authors |
Cristina Nicolae, Gabriel Nicolae and Kirk Roberts |
Abstract |
The phenomenon of coreference, covering entities, their mentions and their properties, is intricately linked to the phenomenon of coherence, covering the structure of rhetorical relations in a discourse. A text corpus that has both phenomena annotated can be used to test hypotheses about their interrelation or to detect other phenomena. We present the process by which C-3, a new corpus, was obtained by annotating the Discourse GraphBank coherence corpus with entity and mention information. The annotation followed a set of ACE guidelines adapted to favor coreference and to include entities of unknown types in the annotation. Together with the corpus we offer a new annotation tool specifically designed to annotate entity and mention information within a simple and functional graphical interface that combines the best of all worlds from available annotation tools. The potential usefulness of C-3 is discussed, as well as an application in which the corpus proved to be a valuable resource. |
Topics |
Corpus (creation, annotation, etc.), Anaphora, Coreference, Discourse annotation, representation and processing |
Full paper |
C-3: Coherence and Coreference Corpus |
Slides |
- |
Bibtex |
@InProceedings{NICOLAE10.622,
author = {Cristina Nicolae and Gabriel Nicolae and Kirk Roberts}, title = {C-3: Coherence and Coreference Corpus}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |