LREC 2006 - Proceedings sorted by papers

Title	Integrating Methods and LRs for Automatic Keyword Extraction from Open Domain Texts
Authors	P. Alessandro, F. Marco, M. Massimo
Abstract	The paper presents a tool for keyword extraction from multilingual resources developed within the AXMEDIS project. In this tool lexical collocations (Sinclair, 1991) internal to documents are used to enhance the performance obtained through standard statistical procedure. A first set of mono-term keywords is extracted through the TF.IDF algorithm (Salton, 1989). The internal analysis of the document generates a second set of multi-term keywords based on the first set, rather than on multi-term frequency comparison with a general resource (Witten et al. 1999). Collocations in which a mono-term keyword occurs as the head are considered as multi-term keywords, and are assumed to increase the identification of the content. The evaluation compares the results of the TF.IDF procedure and the ones obtained with the enhanced procedure in terms of “precision”. Each set of keywords received a value from the point of view of a possible user, regarding: (a) overall efficiency of the whole set of keywords for the identification of the content; (b) adequacy of each extracted keyword. Results show that multi-term keywords increase the content identification with a 100% relative factor and that the adequacy is enhanced in 33% of cases.
Keywords	Extraction from Spoken Text. In Proceeding of LREC 2004. Paris, France: ELRA, pp. 2205-2208.
Full paper	Integrating Methods and LRs for Automatic Keyword Extraction from Open Domain Texts

Title

Integrating Methods and LRs for Automatic Keyword Extraction from Open Domain Texts

Authors

P. Alessandro, F. Marco, M. Massimo

Abstract

The paper presents a tool for keyword extraction from multilingual resources developed within the AXMEDIS project. In this tool lexical collocations (Sinclair, 1991) internal to documents are used to enhance the performance obtained through standard statistical procedure. A first set of mono-term keywords is extracted through the TF.IDF algorithm (Salton, 1989). The internal analysis of the document generates a second set of multi-term keywords based on the first set, rather than on multi-term frequency comparison with a general resource (Witten et al. 1999). Collocations in which a mono-term keyword occurs as the head are considered as multi-term keywords, and are assumed to increase the identification of the content. The evaluation compares the results of the TF.IDF procedure and the ones obtained with the enhanced procedure in terms of “precision”. Each set of keywords received a value from the point of view of a possible user, regarding: (a) overall efficiency of the whole set of keywords for the identification of the content; (b) adequacy of each extracted keyword. Results show that multi-term keywords increase the content identification with a 100% relative factor and that the adequacy is enhanced in 33% of cases.

Keywords

Extraction from Spoken Text. In Proceeding of LREC 2004. Paris, France: ELRA, pp. 2205-2208.

Full paper

Integrating Methods and LRs for Automatic Keyword Extraction from Open Domain Texts