Title |
Automatic Annotation and Manual Evaluation of the Diachronic German Corpus TüBa-D/DC |
Authors |
Erhard Hinrichs and Thomas Zastrow |
Abstract |
This paper presents the Tübingen Baumbank des Deutschen Diachron (TüBa-D/DC), a linguistically annotated corpus of selected diachronic materials from the German Gutenberg Project. It was automatically annotated by a suite of NLP tools integrated into WebLicht, the linguistic chaining tool used in CLARIN-D. The annotation quality has been evaluated manually for a subcorpus ranging from Middle High German to Modern High German. The integration of the TüBa-D/DC into the CLARIN-D infrastructure includes metadata provision and harvesting as well as sustainable data storage in the Tübingen CLARIN-D center. The paper further provides an overview of the possibilities of accessing the TüBa-D/DC data. Methods for full-text search of the metadata and object data and for annotation-based search of the object data are described in detail. The WebLicht Service Oriented Architecture is used as an integrated environment for annotation based search of the TüBa-D/DC. WebLicht thus not only serves as the annotation platform for the TüBa-D/DC, but also as a generic user interface for accessing and visualizing it. |
Topics |
Corpus (creation, annotation, etc.), Grammar and Syntax, Part of speech tagging |
Full paper |
Automatic Annotation and Manual Evaluation of the Diachronic German Corpus TüBa-D/DC |
Bibtex |
@InProceedings{HINRICHS12.166,
author = {Erhard Hinrichs and Thomas Zastrow}, title = {Automatic Annotation and Manual Evaluation of the Diachronic German Corpus TüBa-D/DC}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |