Title |
Similar Term Discovery using Web Search |
Authors |
Peter Anick, Vijay Murthi and Shaji Sebastian |
Abstract |
We present an approach to the discovery of semantically similar terms that utilizes a web search engine as both a source for generating related terms and a tool for estimating the semantic similarity of terms. The system works by associating with each document in the search engines index a weighted term vector comprising those phrases that best describe the documents subject matter. Related terms for a given seed phrase are generated by running the seed as a search query and mining the result vector produced by averaging the weights of terms associated with the top documents of the query result set. The degree of similarity between the seed term and each related term is then computed as the cosine of the angle between their respective result vectors. We test the effectiveness of this approach for building a term recommender system designed to help online advertisers discover additional phrases to describe their product offering. A comparison of its output with that of several alternative methods finds it to be competitive with the best known alternative. |
Language |
Multiple languages |
Topics |
Information Extraction, Information Retrieval, Tools, systems, applications, Text mining |
Full paper |
Similar Term Discovery using Web Search |
Slides |
- |
Bibtex |
@InProceedings{ANICK08.306,
author = {Peter Anick, Vijay Murthi and Shaji Sebastian},
title = {Similar Term Discovery using Web Search},
booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
year = {2008},
month = {may},
date = {28-30},
address = {Marrakech, Morocco},
editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
publisher = {European Language Resources Association (ELRA)},
isbn = {2-9517408-4-0},
note = {http://www.lrec-conf.org/proceedings/lrec2008/},
language = {english}
} |