Title |
HamleDT 2.0: Thirty Dependency Treebanks Stanfordized |
Authors |
Rudolf Rosa, Jan Mašek, David Mareček, Martin Popel, Daniel Zeman and Zdeněk Žabokrtský |
Abstract |
We present HamleDT 2.0 (HArmonized Multi-LanguagE Dependency Treebank). HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a treebank annotation style that became popular in recent years. We use the newest basic Universal Stanford Dependencies, without added language-specific subtypes. We describe both of the annotation styles, including adjustments that were necessary to make, and provide details about the conversion process. We also discuss the differences between the two styles, evaluating their advantages and disadvantages, and note the effects of the differences on the conversion. We regard the stanfordization as generally successful, although we admit several shortcomings, especially in the distinction between direct and indirect objects, that have to be addressed in future. We release part of HamleDT 2.0 freely; we are not allowed to redistribute the whole dataset, but we do provide the conversion pipeline. |
Topics |
Multilinguality, Parsing |
Full paper |
HamleDT 2.0: Thirty Dependency Treebanks Stanfordized |
Bibtex |
@InProceedings{ROSA14.915,
author = {Rudolf Rosa and Jan Mašek and David Mareček and Martin Popel and Daniel Zeman and Zdeněk Žabokrtský}, title = {HamleDT 2.0: Thirty Dependency Treebanks Stanfordized}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |