LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title Language Resources Development at the Spanish Royal Academy
Authors Municio Ángel Martín (Real Academia Española Felipe IV 4, 28014 Madrid, Spain, email: amunicio@rae.es)
Rojo Guillermo (Dept. of Spanish Language, University of Santiago de Compostela, Burgo das Nacións, s/n., E-15771 Santiago de Compostela, Spain, fegrojo@usc.es)
Sánchez León Fernando (Real Academia Española Felipe IV 4, 28014 Madrid, Spain, email: fsanchez@rae.es)
Pinillos Octavio (Real Academia Española Felipe IV 4, 28014 Madrid, Spain, email: pinillos@rae.es)
Keywords Corpus, Grammars, Lexicography, Lexicon, Morphological Analysis, NLP Tools, Spanish, Spoken Corpus
Session Session WO15 - Language Resources Projects
Full Paper 297.ps, 297.pdf
Abstract This paper explains some of the most relevant issues concerning the development of language resources at the Spanish Royal Academy. Two 125-M words corpus of Spanish language (synchronic and diachronic) and three specialized corpus has been developed. Around the corpus, RAE is also developing NLP tools and resources to morpho-syntactically annotate them. Some of the most relevant are: The Computational Lexicon, the Morphological analysis tools, the Disambiguation grammars and the Tokenizer generator. The last section describes the lexicographic use of corpus materials and includes a brief description of the Corpus-based lexicographical workbench and his related tools.