Workshop on Linguistic Coreference

LINGUISTIC COREFERENCE WORKSHOP

26 May 1998, Morning Session

Held in conjunction with
The First International Conference on Language Resources and Evaluation
Granada, Spain (28-30 May 1998)

WORKSHOP AIMS

It is essential, for a natural language processing system, to instantiate each object, process, attribute, and property correctly, so that all references to the same item be recognized as such and an inventory of all distinct items be accurate at all times. This problem is far from being resolved. There are both linguistic and computational reasons for this deficiency. First, there is no satisfactory microtheory of linguistic coreference. Secondly and consequently, there is no satisfactory application of such a microtheory to NLP.

A microtheory of coreference in natural language includes in its scope all the phenomena that satisfy the following condition: an object/entity, an event, an attribute, a property or its value, an attitude, or any combination of the above is referred to more than once in a natural-language text, and the understanding of the text depends on the correct interpretation of the two or more referring expressions as designating the same object, event, etc. A linguistic microtheory of coreference for a language consists of the following elements:

a complete range of covered phenomena in the language;
a taxonomy of the range;
a typology of the range;
a list of rules forming the various types of coreference;
a list of rules interpreting the various types of coreference.

There has been a considerable amount of work on a few selected types of coreference, focusing almost exclusively on object coreference. Thus, significant work has been done in theoretical linguistics on anaphora and cataphora, subsuming, for the large part, earlier work on deixis. A small minority of authors have tried to extend their studies of anaphora beyond mere syntax. In the cognitive-linguistics and philosophy-of-language traditions, interesting work has been done relating anaphora and deixis to ambiguity resolution and discourse structure. At the same time, an effort in comparative-contrastive linguistics has led some writers to examining the data of more than one language at a time, still emphasizing entity or object reference.

In computational linguistics, the problem of coreference took early on the form of pronoun antecedent resolution, and this particular task, somewhat broadened to include a few other types of anaphora, still remains in the center of the problem. The most sustained effort in the computational treatment of coreference has been mounted within the Tipster/MUC-6 initiative. While it has been recognized since quite early in the game that coreference resolution is based in large part on world knowledge, most of the work done on the matter computationally and theoretically ignores and avoids world knowledge. The MUC-6 initiative makes such an orientation quite explicit: the work should be based on such simpler resources as part-of-speech tagging, simple noun phrase recognition, basic semantic category information like, gender, number, and [to a limited extent] full parse trees. Such an approach--trying to explore and maximize everything that can be done simply and cheaply towards the resolution of a complex program--is perfectly legitimate as long as it is realized that a considerable part of the problem remains unsolved, and it is indeed realized fully well within the MUC-6 initiative.

One persistent problem throughout the existing computational ventures into coreference has been the lack of a consistent theoretical approach to it. The result is that coreference phenomena are treated as self-obvious, and most of them are overlooked, especially if they are not explicit pronoun-antecedent or other equally evident anaphora cases. What is needed for a full, accurate, and reliable approach to coreference can be summarized, somewhat schematically, as involving the following steps:

understanding fully the range of the phenomenon and of the rules that govern it (theory);
determining the extent of machine-tractable information in the rules;
taking stock of all the rules that can be computed;
developing the appropriate heuristics for the computable rules;
computing the rules.

WORKSHOP AGENDA

The workshop will be held during the morning session of 26 May 1998 and will include a joint address by the Organizing Committee (listed above), followed by 5-8 individual presentations in two 90-120-minute blocks, with a break provided midway through.

CALL FOR PAPERS

The Workshop solicits papers addressing any one or more of the points addressed above as well as any other pertinent issues.

Papers based on a diversity of languages are encouraged, both one language at a time and, especially, comparative/contrastive studies. Also strongly encouraged are papers which extend the study of coreference beyond entity/object reference, across document boundaries, and/or into non-text media.

FORMAT FOR SUBMISSION

Paper submissions should consist of an extended abstract of approximately 800 words, along with a brief description of the proposed presentation structure (e.g., paper, paper plus demo,etc.).

Each submission should include a separate title page, providing the following information: the title to be printed in the Conference program; names and affiliations of all authors; the full address of the primary author (or alternate contact person), including phone, fax, email; and required audio-visual equipment.

Papers may be submitted by sending three hardcopies or one softcopy (in TeX, ASCII, or post-script format) to the appropriate address as listed below:

Dr. Victor Raskin
Chair, Interdepartmental Program in Linguistics
Heavilon Hall
Purdue University
West Lafayette, IN 47907 USA

vraskin@purdue.edu

Submissions must be received no later than 1 March 1998 for a 15 March notification of paper acceptance. (Full versions of all accepted papers are requested no later than 15 April 1998 for inclusion in the conference proceedings.)

WORKSHOP ORGANIZING COMMITTEE

Dr. Sara J. Shelton (Contact Person)
US Department of Defense
9800 Savage Road, R525
Ft Meade, MD 20755 USA
sjshelt@afterlife.ncsc.mil
301-688-0301 (voice)
301-688-0338 (fax)

Dr. Eduard Hovy
Information Sciences Institute
University of Southern California
4676 Admirality Way
Marina Del Rey, CA 90292-669 USA
hovy@isi.edu
310-822-1511, ext. 731 (voice)

Dr. Victor Raskin
Interdepartmental Program in Linguistics
Heavilon Hall
Purdue University
West Lafayette, IN 47907 USA
vraskin@purdue.edu
765-494-3782 (voice)
765-494-3780 (fax) ml>