Summary of the paper

Title Analysis and Performance of Morphological Query Expansion and Language-Filtering Words on Basque Web Searching
Authors Igor Leturia, Antton Gurrutxaga, Nerea Areta and Eli Pociello
Abstract Morphological query expansion and language-filtering words have proved to be valid methods when searching the web for content in Basque via APIs of commercial search engines, as the implementation of these methods in recent IR and web-as-corpus tools shows, but no real analysis has been carried out to ascertain the degree of improvement, apart from a comparison of recall and precision using a classical web search engine and measured in terms of hit counts. This paper deals with a more theoretical study that confirms the validity of the combination of both methods. We have measured the increase in recall obtained by morphological query expansion and the increase in precision and loss in recall produced by language-filtering-words, but not only by searching the web directly and looking at the hit counts “which are not considered to be very reliable at best”, but also using both a Basque web corpus and a classical lemmatised corpus, thus providing more exact quantitative results. Furthermore, we provide various corpora-extracted data to be used in the aforementioned methods, such as lists of the most frequent inflections and declinations (cases, persons, numbers, times, etc.) for each POS “the most interesting word forms for a morphologically expanded query”, or a list of the most used Basque words with their frequencies and document-frequencies “the ones that should be used as language-filtering words”.
Language Single language
Topics Information Extraction, Information Retrieval, Morphology, Corpus (creation, annotation, etc.)
Full paper Analysis and Performance of Morphological Query Expansion and Language-Filtering Words on Basque Web Searching
Slides Analysis and Performance of Morphological Query Expansion and Language-Filtering Words on Basque Web Searching
Bibtex @InProceedings{LETURIA08.185,
  author = {Igor Leturia, Antton Gurrutxaga, Nerea Areta and Eli Pociello},
  title = {Analysis and Performance of Morphological Query Expansion and Language-Filtering Words on Basque Web Searching},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {http://www.lrec-conf.org/proceedings/lrec2008/},
  language = {english}
  }

Powered by ELDA © 2008 ELDA/ELRA