Title |
Czech Information Retrieval with Syntax-based Language Models |
Authors |
Jana Straková and Pavel Pecina |
Abstract |
In recent years, considerable attention has been dedicated to language modeling methods in information retrieval. Although these approaches generally allow exploitation of any type of language model, most of the published experiments were conducted with a classical n-gram model, usually limited only to unigrams. A few works exploiting syntax in information retrieval can be cited in this context, but significant contribution of syntax based language modeling for information retrieval is yet to be proved. In this paper, we propose, implement, and evaluate an enrichment of language model employing syntactic dependency information acquired automatically from both documents and queries. Our experiments are conducted on Czech which is a morphologically rich language and has a considerably free word order, therefore a syntactic language model is expected to contribute positively to the unigram and bigram language model based on surface word order. By testing our model on the Czech test collection from Cross Language Evaluation Forum 2007 Ad-Hoc track, we show positive contribution of using dependency syntax in this context. |
Topics |
Information Extraction, Information Retrieval, Language modelling, Grammar and Syntax |
Full paper |
Czech Information Retrieval with Syntax-based Language Models |
Slides |
Czech Information Retrieval with Syntax-based Language Models |
Bibtex |
@InProceedings{STRAKOV10.215,
author = {Jana Straková and Pavel Pecina}, title = {Czech Information Retrieval with Syntax-based Language Models}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |