Title |
KPWr: Towards a Free Corpus of Polish |
Authors |
Bartosz Broda, Michał Marcińczuk, Marek Maziarz, Adam Radziszewski and Adam Wardyński |
Abstract |
This paper presents our efforts aimed at collecting and annotating a free Polish corpus. The corpus will serve for us as training and testing material for experiments with Machine Learning algorithms. As others may also benefit from the resource, we are going to release it under a Creative Commons licence, which is hoped to remove unnecessary usage restrictions, but also to facilitate reproduction of our experimental results. The corpus is being annotated with various types of linguistic entities: chunks and named entities, selected syntactic and semantic relations, word senses and anaphora. We report on the current state of the project as well as our ultimate goals. |
Topics |
Corpus (creation, annotation, etc.), Named Entity recognition, Word Sense Disambiguation |
Full paper |
KPWr: Towards a Free Corpus of Polish |
Bibtex |
@InProceedings{BRODA12.965,
author = {Bartosz Broda and Michał Marcińczuk and Marek Maziarz and Adam Radziszewski and Adam Wardyński}, title = {KPWr: Towards a Free Corpus of Polish}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |