Title |
A Typology of Near-Identity Relations for Coreference (NIDENT) |
Authors |
Marta Recasens, Eduard Hovy and M. Antònia Martí |
Abstract |
The task of coreference resolution requires people or systems to decide when two referring expressions refer to the 'same' entity or event. In real text, this is often a difficult decision because identity is never adequately defined, leading to contradictory treatment of cases in previous work. This paper introduces the concept of 'near-identity', a middle ground category between identity and non-identity, to handle such cases systematically. We present a typology of Near-Identity Relations (NIDENT) that includes fifteen types―grouped under four main families―that capture a wide range of ways in which (near-)coreference relations hold between discourse entities. We validate the theoretical model by annotating a small sample of real data and showing that inter-annotator agreement is high enough for stability (K=0.58, and up to K=0.65 and K=0.84 when leaving out one and two outliers, respectively). This work enables subsequent creation of the first internally consistent language resource of this type through larger annotation efforts. |
Topics |
Anaphora, Coreference, Corpus (creation, annotation, etc.), Discourse annotation, representation and processing |
Full paper |
A Typology of Near-Identity Relations for Coreference (NIDENT) |
Slides |
- |
Bibtex |
@InProceedings{RECASENS10.160,
author = {Marta Recasens and Eduard Hovy and M. Antònia Martí}, title = {A Typology of Near-Identity Relations for Coreference (NIDENT)}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |