Title |
Word Sense Disambiguation with Information Retrieval Technique |
Authors |
Jong-Hoon Oh (Division of Computer Science, Dept. of EECS, Korea Advanced Institute of Science and Technology 373-1 Guseong-dong Yuseong-gu Deajeon 305-701 Korea) Saim Shin (Division of Computer Science, Dept. of EECS, Korea Advanced Institute of Science and Technology 373-1 Guseong-dong Yuseong-gu Deajeon 305-701 Korea) Yong-Seok Choi (Division of Computer Science, Dept. of EECS, Korea Advanced Institute of Science and Technology 373-1 Guseong-dong Yuseong-gu Deajeon 305-701 Korea) Key-Sun Choi (Division of Computer Science, Dept. of EECS, Korea Advanced Institute of Science and Technology 373-1 Guseong-dong Yuseong-gu Deajeon 305-701 Korea) |
Session |
WP3: Tools & Components |
Abstract |
This paper reports on word sense disambiguation of Korean nouns with information retrieval technique. First, context vectors are constructed using contextual words in training data. Then, the words in the context vector are weighted with local density. Each sense of a target word is represented as ¡®Static Sense Vector¡¯ in word space, which is the centroid of the context vectors. Contextual noise is removed using selective sampling. A selective sampling method use information retrieval technique, so as to enhance the discriminative power. We regard training samples as indexed documents and test samples as queries. We can retrieve relevant top-N training samples for a query (a test sample) and construct ¡®Dynamic Sense Vector¡¯ using the retrieved training samples. A word sense is estimated using the ¡®Static Sense Vector¡¯ and ¡®Dynamic Sense Vector¡¯. The Korean SENSEVAL test suit is used for this experiment and our method produces relatively good results. |
Keywords |
Word sense disambiguation, Information retrieval, SENSEVAL, Selective sampling, Sense vector |
Full Paper |