LREC 2000 2nd International Conference on Language Resources & Evaluation | ||||||
Title | Coreference in Annotating a Large Corpus |
Authors | Hajičová Eva (Faculty of Mathematics and Physics, Charles University, Malostranské námêstí 25, 1180 Praha 1, Czechia, hajicova@ufal.mff.cuni.cz) Panenová Jarmila (Faculty of Mathematics and Physics, Charles University, Malostranské námêstí 25, 1180 Praha 1, Czechia, panevova@ufal.mff.cuni.cz) Sgall Petr (Faculty of Mathematics and Physics, Charles University, Malostranské námêstí 25, 1180 Praha 1, Czechia, sgall@ufal.mff.cuni.cz) |
Keywords | Coreference, Corpus, Dependency, Syntax |
Session | Session WP2 - Corpus Annotation |
Full Paper | 19.ps, 19.pdf |
Abstract | The Prague Dependency Treebank (PDT) is a part of the Czech National Corpus, annotated with disambiguated structural descriptions representing the meaning of every sentence in its environment. To achieve that aim, it is necessary i.a. to make explicit (at least some basic) coreferential relations within the sentence boundaries and also beyond them. The PDT scenario includes both automatic and 'manual' procedures; among the former type, there is one that concerns coreference, indicating the lemma of the subject in a specific attribute of the label belonging to a node for a reflexive pronoun, and assigning the deleted nodes in coordinated constructions the lemmas of their counterparts in the given construction. 'Manual' operations restore nodes for the deleted items mostly as pronouns. The distinction between grammatical and textual coreference is reflected. In order to get a possibility of handling textual coreference, specific attributes reflect the linking of sentences to each other and to the context of situation, and the development of the degrees of activation of the 'stock of shared knowledge' will be registered in so far as they are derivable from the use of nouns in subsequent utterances in a discourse. |