SUMMARY : Session O39-W Lexicons, Semantics, Clustering Tools

 

Title Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words
Authors M. Islam, D. Inkpen
Abstract This paper presents a new corpus-based method for calculating the semantic similarity of two target words. Our method, called Second Order Co-occurrencePMI (SOC-PMI), uses Pointwise Mutual Information to sort lists of important neighbor words of the two target words. Then we consider the words which are common in both lists and aggregate their PMI values (from the opposite list) to calculate the relative semantic similarity. Our method was empirically evaluated using Miller and Charler’s (1991) 30 noun pair subset, Ruben-stein and Goodenough’s (1965) 65 noun pairs, 80 synonym test questions from the Test of English as a Foreign Language (TOEFL), and 50 synonym test questions from a collection of English as a Second Language (ESL) tests. Evaluation results show that our method outperforms several competing corpus-based methods.
Keywords semantic, similarity, words, PMI, corpus, co-occurrence
Full paper Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words