LREC 2000 2nd International Conference on Language Resources & Evaluation | ||||||
Title | ItalWordNet: a Large Semantic Database for Italian |
Authors | Roventini Adriana (Istituto di Linguistica Computazionale, CNR, Area della Ricerca di Pisa, Via Alfieri 1, Loc. S. Cataldo, Ghezzano 56010 (PI) – ITALY, adriana@ilc.pi.cnr.it) Alonge Antonietta (Istituto di Linguistica Computazionale, CNR, Area della Ricerca di Pisa, Via Alfieri 1, Loc. S. Cataldo, Ghezzano 56010 (PI) – ITALY, antoalonge@libero.it) Calzolari Nicoletta (Istituto di Linguistica Computazionale, CNR, Area della Ricerca di Pisa, Via Alfieri 1, Loc. S. Cataldo, Ghezzano 56010 (PI) – ITALY, glottolo@ilc.pi.cnr.it) Magnini Bernardo (Istituto per la Ricerca Scientifica e Tecnologica, I-38050, Povo, Trento, magnini@irst.itc.it) Bertagna Francesca (Consorzio Pisa Ricerche, Via S. Maria 40, Pisa 56100 - ITALY, F.Bertagna@ilc.pi.cnr.it) |
Keywords | Lexical Resources, Rexical Semantic Networks |
Session | Session WO11 - Mono-Multilingual Lexicon Acquisition and Building |
Full Paper | 129.ps, 129.pdf |
Abstract | The focus of this paper is on the work we are carrying out to develop a large semantic database within an Italian national project, SI-TAL, aiming at realizing a set of integrated (compatible) resources and tools for the automatic processing of the Italian language. Within SI-TAL, ItalWordNet is the reference lexical resource which will contain information related to about 130,000 word senses grouped into synsets. This lexical database is not being created ex novo, but extending and revising the Italian lexical wordnet built in the framework of the EuroWordNet project. In this paper we firstly describe how the lexical coverage of our wordnet is being extended by adding adjectives, adverbs and proper nouns, plus a terminological subset belonging to the economic and financial domain. The relevant changes involved by these extensions both in the linguistic model and in the data structure are then illustrated. In particular we discuss i) the new semantic relations identified to encode information on adjectives and adverbs ii) the new architecture including the terminological subset. |