LREC 2018 Proceedings

Summary of the paper

Title	SzegedKoref: A Hungarian Coreference Corpus
Authors	Veronika Vincze, Klára Hegedűs, Alex Sliz-Nagy and Richárd Farkas
Abstract	In this paper we introduce SzegedKoref, a Hungarian corpus in which coreference relations are manually annotated. For annotation, we selected some texts of Szeged Treebank, the biggest treebank of Hungarian with manual annotation at several linguistic layers. The corpus contains approximately 55,000 tokens and 4000 sentences. Due to its size, the corpus can be exploited in training and testing machine learning based coreference resolution systems, which we would like to implement in the near future. We present the annotated texts, we describe the annotated categories of anaphoric relations, we report on the annotation process and we offer several examples of each annotated category. Two linguistic phenomena -- phonologically empty pronouns and pronouns referring to subordinate clauses -- are important characteristics of Hungarian coreference relations. In our paper, we also discuss both of them.
Topics	Anaphora, Coreference, Corpus (Creation, Annotation, Etc.), Semantics
Full paper	SzegedKoref: A Hungarian Coreference Corpus
Bibtex	@InProceedings{VINCZE18.325, author = {Veronika Vincze and Klára Hegedűs and Alex Sliz-Nagy and Richárd Farkas}, title = "{SzegedKoref: A Hungarian Coreference Corpus}", booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {May 7-12, 2018}, address = {Miyazaki, Japan}, editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga}, publisher = {European Language Resources Association (ELRA)}, isbn = {979-10-95546-00-9}, language = {english} }