LREC 2000 2nd International Conference on Language Resources & Evaluation | ||||||
Title | An Architecture for Document Routing in Spanish: Two Language Components, Pre-processor and Parser |
Authors | Rojo Guillermo (Dept. of Spanish Language, University of Santiago de Compostela, Burgo das Nacións, s/n., E-15771 Santiago de Compostela, Spain, fegrojo@usc.es) Álvarez Maria Concepción (Dept. of Spanish Language, University of Santiago de Compostela, Burgo das Nacións, s/n., E-15771 Santiago de Compostela, Spain, femcal@usc.es) Alvariño Pilar (Dept. of Spanish Language, University of Santiago de Compostela, Burgo das Nacións, s/n., E-15771 Santiago de Compostela, Spain, fepili@usc.es) Gil Adelaida (Dept. of Spanish Language, University of Santiago de Compostela, Burgo das Nacións, s/n., E-15771 Santiago de Compostela, Spain, iagilma@usc.es) Santalla María Paula (Dept. of Spanish Language, University of Santiago de Compostela, Burgo das Nacións, s/n., E-15771 Santiago de Compostela, Spain, fempsr@usc.es) Sotelo Susana (Dept. of Spanish Language, University of Santiago de Compostela, Burgo das Nacións, s/n., E-15771 Santiago de Compostela, Spain, fesdocio@usc.es) |
Keywords | Document Routing, Information Retrieval, Parsing, Syntactic Normalization |
Session | Session WO9 - Applications in the Written Area |
Full Paper | 91.ps, 91.pdf |
Abstract | This paper describes the language components of a system for Document Routing in Spanish. The system identifies relevant terms for classification within involved documents by means of natural language processing techniques. These techniques are based on the isolation and normalization of syntactic unities considered relevant for the classification, especially noun phrases, but also other constituents built around verbs, adverbs, pronouns or adjectives. After a general introduction about the research project, the second Section relates our approach to the problem with other previous and current approaches, the third one describes corpora used for evaluating the system. The linguistic analysis architecture, including pre-processing and two different levels of syntactic analysis, is described in following fourth and fifth Sections, while the last one is dedicated to a comparative analysis of results obtained from the processing of corpora introduced in third Section. Certain future developments of the system are also included in this Section. |