LREC 2000 2nd International Conference on Language Resources & Evaluation | |
Conference Papers
Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377. |
Previous Paper Next Paper
Title | Building a Treebank for French |
Authors |
Abeille Anne (TALaNa, Universite Paris 7, 75251 Paris cedex 05, FRANCE, abeille@linguist.jussieu.fr) Clement Lionel (TALaNa, Universite Paris 7, 75251 Paris cedex 05, FRANCE, clement@linguist.jussieu.fr) Kinyon Alexandra (University of Pennsylvania, Philadelphia, USA, kiyon@linguist.jussieu.fr) |
Keywords | Corpus Annotation, Corpus Linguistics, Parsing, Shalow Parsing, Tagging, Treebank |
Session | Session WO2 - Treebanks |
Abstract | Very few gold standard annotated corpora are currently available for French. We present an ongoing project to build a reference treebank for French starting with a tagged newspaper corpus of 1 Million words (Abeille et al., 1998), (Abeille and Clement, 1999). Similarly to the Penn TreeBank (Marcus et al., 1993), we distinguish an automatic parsing phase followed by a second phase of systematic manual validation and correction. Similarly to the Prague treebank (Hajicova et al., 1998), we rely on several types of morphosyntactic and syntactic annotations for which we define extensive guidelines. Our goal is to provide a theory neutral, surface oriented, error free treebank for French. Similarly to the Negra project (Brants et al., 1999), we annotate both constituents and functional relations. |