LREC 2000 2nd International Conference on Language Resources & Evaluation | ||||||
Title | A Word-level Morphosyntactic Analyzer for Basque |
Authors | Aduriz I. (Universidad de Barcelona, Gran Vía de las Cortes Catalanas, 585, E-08007 Barcelona) Agirre E. (Dept. of Computer Languages and Systems, University of the Basque Country, 649 P. K., E-20080 Donostia, Basque Country) Aldezabal I. (Dept. of Computer Languages and Systems, University of the Basque Country, 649 P. K., E-20080 Donostia, Basque Country) Arregi X. (Dept. of Computer Languages and Systems, University of the Basque Country, 649 P. K., E-20080 Donostia, Basque Country) Arriola J. M. (Dept. of Computer Languages and Systems, University of the Basque Country, 649 P. K., E-20080 Donostia, Basque Country) Artola X. (Dept. of Computer Languages and Systems, University of the Basque Country, 649 P. K., E-20080 Donostia, Basque Country) Gojenola K. (Dept. of Computer Languages and Systems, University of the Basque Country, 649 P. K., E-20080 Donostia, Basque Country) Maritxalar A. (Dept. of Computer Languages and Systems, University of the Basque Country, 649 P. K., E-20080 Donostia, Basque Country) Sarasola K. (Dept. of Computer Languages and Systems, University of the Basque Country, 649 P. K., E-20080 Donostia, Basque Country) Urkia M. (UZEI, Aldapeta 20 , E-20009 Donostia, Basque Country, jipgogak@si.ehu.es) |
Keywords | Agglutinative Languages, Morphology, Morphosyntax |
Session | Session WP2 - Corpus Annotation |
Full Paper | 44.ps, 44.pdf |
Abstract | This work presents the development and implementation of a full morphological analyzer for Basque, an agglutinative language. Several problems (phrase structure inside word-forms, noun ellipsis, multiplicity of values for the same feature and the use of complex linguistic representations) have forced us to go beyond the morphological segmentation of words, and to include an extra module that performs a full morphosyntactic parsing of each word-form. A unification-based word-level grammar has been defined for that purpose. The system has been integrated into a general environment for the automatic processing of corpora, using TEI-conformant SGML feature structures. |