SUMMARY : Session P27-E
Title | Evaluation of Stop Word Lists in Chinese Language |
---|---|
Authors | F. Zou, F. Wang, X. Deng, S. Han |
Abstract | In modern information retrieval systems, effective indexing can be achieved by removal of stop words. Till now many stop word lists have been developed for English language. However, no standard stop word list has been constructed for Chinese language yet. With the fast development of information retrieval in Chinese language, exploring the evaluation of Chinese stop word lists becomes critical. In this paper, to save the time and release the burden of manual comparison, we propose a novel stop word list evaluation method with a mutual information-based Chinese segmentation methodology. Experiments have been conducted on training texts taken from a recent international Chinese segmentation competition. Results show that effective stop word lists can improve the accuracy of Chinese segmentation significantly. |
Keywords | stop word list, statistical modeling, information theory |
Full paper | Evaluation of Stop Word Lists in Chinese Language |