LREC 2000 2nd International Conference on Language Resources & Evaluation | ||||||
Title | Language Resources Development at the Spanish Royal Academy |
Authors | Municio Ángel Martín (Real Academia Española Felipe IV 4, 28014 Madrid, Spain, email: amunicio@rae.es) Rojo Guillermo (Dept. of Spanish Language, University of Santiago de Compostela, Burgo das Nacións, s/n., E-15771 Santiago de Compostela, Spain, fegrojo@usc.es) Sánchez León Fernando (Real Academia Española Felipe IV 4, 28014 Madrid, Spain, email: fsanchez@rae.es) Pinillos Octavio (Real Academia Española Felipe IV 4, 28014 Madrid, Spain, email: pinillos@rae.es) |
Keywords | Corpus, Grammars, Lexicography, Lexicon, Morphological Analysis, NLP Tools, Spanish, Spoken Corpus |
Session | Session WO15 - Language Resources Projects |
Full Paper | 297.ps, 297.pdf |
Abstract | This paper explains some of the most relevant issues concerning the development of language resources at the Spanish Royal Academy. Two 125-M words corpus of Spanish language (synchronic and diachronic) and three specialized corpus has been developed. Around the corpus, RAE is also developing NLP tools and resources to morpho-syntactically annotate them. Some of the most relevant are: The Computational Lexicon, the Morphological analysis tools, the Disambiguation grammars and the Tokenizer generator. The last section describes the lexicographic use of corpus materials and includes a brief description of the Corpus-based lexicographical workbench and his related tools. |