| Title | Using the Web as a Corpus for the Syntactic-Based Collocation Identification |
| Author(s) |
Violeta Seretan, Luka Nerima, Eric Wehrli
Language Technology Laboratory, University of Geneva, Switzerland |
| Session | P21-W |
| Abstract | This paper presents an experiment that uses a Web search engine and a robust parser for the Web-based identification of collocations (statistically significant word associations representing “a conventional way of saying things” (Manning and Schütze, 1999)). We identify the possible collocates of a given word by parsing the text snippets returned by the search engine when querying that word. Then, we rank the list of syntactic co-occurrences retrieved according to the collocational strength of each pair by using different statistical measures. |
| Keyword(s) | Collocation extraction, web search service, Web as a corpus, syntactic analysis, lexical association measures |
| Language(s) | French, English |
| Full Paper | 619.pdf |