Summary of the paper

Title Cleaneval: a Competition for Cleaning Web Pages
Authors Marco Baroni, Francis Chantree, Adam Kilgarriff and Serge Sharoff
Abstract Cleaneval is a shared task and competitive evaluation on the topic of cleaning arbitrary web pages, with the goal of preparing web data for use as a corpus for linguistic and language technology research and development. The first exercise took place in 2007. We describe how it was set up, results, and lessons learnt
Language Multiple languages
Topics Corpus (creation, annotation, etc.), Evaluation methodologies, LR web services
Full paper Cleaneval: a Competition for Cleaning Web Pages
Slides -
Bibtex @InProceedings{BARONI08.162,
  author = {Marco Baroni, Francis Chantree, Adam Kilgarriff and Serge Sharoff},
  title = {Cleaneval: a Competition for Cleaning Web Pages},
  booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
  year = {2008},
  month = {may},
  date = {28-30},
  address = {Marrakech, Morocco},
  editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {2-9517408-4-0},
  note = {http://www.lrec-conf.org/proceedings/lrec2008/},
  language = {english}
  }

Powered by ELDA © 2008 ELDA/ELRA