LREC 2000 2nd International Conference on Language Resources & Evaluation | ||||||
Title | Building a Treebank for French |
Authors | Abeillé Anne (TALaNa, Université Paris 7, 75251 Paris cedex 05, FRANCE, abeille@linguist.jussieu.fr) Clément Lionel (TALaNa, Université Paris 7, 75251 Paris cedex 05, FRANCE, clement@linguist.jussieu.fr) Kinyon Alexandra (University of Pennsylvania, Philadelphia, USA, kiyon@linguist.jussieu.fr) |
Keywords | Corpus Annotation, Corpus Linguistics, Parsing, Shalow Parsing, Tagging, Treebank |
Session | Session WO2 - Treebanks |
Full Paper | 230.ps, 230.pdf |
Abstract | Very few gold standard annotated corpora are currently available for French. We present an ongoing project to build a reference treebank for French starting with a tagged newspaper corpus of 1 Million words (Abeillé et al., 1998), (Abeillé and Clément, 1999). Similarly to the Penn TreeBank (Marcus et al., 1993), we distinguish an automatic parsing phase followed by a second phase of systematic manual validation and correction. Similarly to the Prague treebank (Hajicova et al., 1998), we rely on several types of morphosyntactic and syntactic annotations for which we define extensive guidelines. Our goal is to provide a theory neutral, surface oriented, error free treebank for French. Similarly to the Negra project (Brants et al., 1999), we annotate both constituents and functional relations. |