SUMMARY : Session P16-T

 

Title Integrating Methods and LRs for Automatic Keyword Extraction from Open Domain Texts
Authors P. Alessandro, F. Marco, M. Massimo
Abstract The paper presents a tool for keyword extraction from multilingual resources developed within the AXMEDIS project. In this tool lexical collocations (Sinclair, 1991) internal to documents are used to enhance the performance obtained through standard statistical procedure. A first set of mono-term keywords is extracted through the TF.IDF algorithm (Salton, 1989). The internal analysis of the document generates a second set of multi-term keywords based on the first set, rather than on multi-term frequency comparison with a general resource (Witten et al. 1999). Collocations in which a mono-term keyword occurs as the head are considered as multi-term keywords, and are assumed to increase the identification of the content. The evaluation compares the results of the TF.IDF procedure and the ones obtained with the enhanced procedure in terms of “precision”. Each set of keywords received a value from the point of view of a possible user, regarding: (a) overall efficiency of the whole set of keywords for the identification of the content; (b) adequacy of each extracted keyword. Results show that multi-term keywords increase the content identification with a 100% relative factor and that the adequacy is enhanced in 33% of cases.
Keywords Extraction from Spoken Text. In Proceeding of LREC 2004. Paris, France: ELRA, pp. 2205-2208.
Full paper Integrating Methods and LRs for Automatic Keyword Extraction from Open Domain Texts