Summary of the paper

Title Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora
Authors Nelly Barbot, Olivier Boeffard and Arnaud Delhay
Abstract Set covering algorithms are efficient tools for solving an optimal linguistic corpus reduction. The optimality of such a process is directly related to the descriptive features of the sentences of a reference corpus. This article suggests to verify experimentally the behaviour of three algorithms, a greedy approach and a lagrangian relaxation based one giving importance to rare events and a third one considering the Kullback-Liebler divergence between a reference and the ongoing distribution of events. The analysis of the content of the reduced corpora shows that the both first approaches stay the most effective to compress a corpus while guaranteeing a minimal content. The variant which minimises the Kullback-Liebler divergence guarantees a distribution of events close to a reference distribution as expected; however, the price for this solution is a much more important corpus. In the proposed experiments, we have also evaluated a mixed-approach considering a random complement to the smallest coverings.
Topics Corpus (creation, annotation, etc.), Information Extraction, Information Retrieval, Tools, systems, applications
Full paper Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora
Bibtex @InProceedings{BARBOT12.381,
  author = {Nelly Barbot and Olivier Boeffard and Arnaud Delhay},
  title = {Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {may},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {english}
 }
Powered by ELDA © 2012 ELDA/ELRA