Summary of the paper

Title Polish Parliamentary Corpus
Authors Maciej Ogrodniczuk
Abstract This paper presents the Polish Parliamentary Corpus (PPC) – a new resource built upon the Polish Sejm Corpus and extended with current Senate proceedings and older (1918–1990) parliamentary transcripts. Corpus texts are automatically annotated with state-of-the-art language tools for Polish, resulting in a multi-layered stand-off sentence- and token-level segmentation, disambiguated morphosyntactic information, syntactic words and groups, named entities and coreference. The corpus is being constantly updated with new data from the current sittings. Currently the PPC is among the largest parliamentary corpora worldwide, amounting to approx. 300M words.
Full paper Polish Parliamentary Corpus
Bibtex @InProceedings{OGRODNICZUK18.11,
  author = {Maciej Ogrodniczuk},
  title = {Polish Parliamentary Corpus},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Darja Fišer and Maria Eskevich and Franciska de Jong},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-02-3},
  language = {english}
Powered by ELDA © 2018 ELDA/ELRA