Summary of the paper

Title Priberam Compressive Summarization Corpus: A New Multi-Document Summarization Corpus for European Portuguese
Authors Miguel B. Almeida, Mariana S. C. Almeida, andré F. T. Martins, Helena Figueira, Pedro Mendes and Cláudia Pinto
Abstract In this paper, we introduce the Priberam Compressive Summarization Corpus, a new multi-document summarization corpus for European Portuguese. The corpus follows the format of the summarization corpora for English in recent DUC and TAC conferences. It contains 80 manually chosen topics referring to events occurred between 2010 and 2013. Each topic contains 10 news stories from major Portuguese newspapers, radio and TV stations, along with two human generated summaries up to 100 words. Apart from the language, one important difference from the DUC/TAC setup is that the human summaries in our corpus are \emph{compressive}: the annotators performed only sentence and word deletion operations, as opposed to generating summaries from scratch. We use this corpus to train and evaluate learning-based extractive and compressive summarization systems, providing an empirical comparison between these two approaches. The corpus is made freely available in order to facilitate research on automatic summarization.
Topics Corpus (Creation, Annotation, etc.), Information Extraction, Information Retrieval
Full paper Priberam Compressive Summarization Corpus: A New Multi-Document Summarization Corpus for European Portuguese
Bibtex @InProceedings{ALMEIDA14.187,
  author = {Miguel B. Almeida and Mariana S. C. Almeida and andré F. T. Martins and Helena Figueira and Pedro Mendes and Cláudia Pinto},
  title = {Priberam Compressive Summarization Corpus: A New Multi-Document Summarization Corpus for European Portuguese},
  booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
  year = {2014},
  month = {may},
  date = {26-31},
  address = {Reykjavik, Iceland},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-8-4},
  language = {english}
 }
Powered by ELDA © 2014 ELDA/ELRA