LREC 2000 2nd International Conference on Language Resources & Evaluation | |
Conference Papers
Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377. |
Previous Paper Next Paper
Title | Building a Treebank for Italian: a Data-driven Annotation Schema |
Authors |
Bosco Cristina (Dipartimento di Informatica, Universita di Torino, c.so Svizzera 185, 10149, Torino (Italy), bosco@di.unito.it) Lombardo Vincenzo (DISTA – Universita del Piemonte Orientale “A. Avogadro”, c.so Borsalino 54, 15100 Alessandria, Italy, Centro di Scienza Cognitiva – Universita di Torino, via Lagrange 3, 10123 Torino, Italy, vincenzo@di.unito.it) Vassallo Daniela (Dipartimento di Informatica, Universita di Torino, c.so Svizzera 185, 10149, Torino (Italy), vassallo@di.unito.it) Lesmo Leonardo (Dipartimento di Informatica, Universita di Torino, c.so Svizzera 185, 10149, Torino (Italy), lesmo@di.unito.it) |
Keywords | Annotation Schema, Corpus, Dependency Format, Italian, Treebank |
Session | Session WO2 - Treebanks |
Abstract | Many natural language researchers are currently turning their attention to treebank development and trying to achieve accuracy and corpus data coverage in their representation formats. This paper presents a data-driven annotation schema developed for an Italian treebank ensuring data coverage and consistency between annotation of linguistic phenomena. The schema is a dependency-based format centered upon the notion of predicate-argument structure augmented with traces to represent discontinuous constituents. The treebank development involves an annotation process performed by a human annotator helped by an interactive parsing tool that builds incrementally syntactic representation of the sentence. To increase the syntactic knowledge of this parser, a specific data-driven strategy has been applied. We describe the cyclical development of the annotation schema highlighting the richness and flexibility of the format, and we present some representational issues. |