Title |
The Use of Parallel and Comparable Data for Analysis of Abstract Anaphora in German and English |
Authors |
Stefanie Dipper, Melanie Seiss and Heike Zinsmeister |
Abstract |
Parallel corpora ― original texts aligned with their translations ― are a widely used resource in computational linguistics. Translation studies have shown that translated texts often differ systematically from comparable original texts. Translators tend to be faithful to structures of the original texts, resulting in a """"shining through"""" of the original language preferences in the translated text. Translators also tend to make their translations most comprehensible with the effect that translated texts can be more explicit than their source texts. Motivated by the need to use a parallel resource for cross-linguistic feature induction in abstract anaphora resolution, this paper investigates properties of English and German texts in the Europarl corpus, taking into account both general features such as sentence length as well as task-dependent features such as the distribution of demonstrative noun phrases. The investigation is based on the entire Europarl corpus as well as on a small subset thereof, which has been manually annotated. The results indicate English translated texts are sufficiently """"authentic"""" to be used as training data for anaphora resolution; results for German texts are less conclusive, though. |
Topics |
Corpus (creation, annotation, etc.), Validation of LRs, Anaphora, Coreference |
Full paper |
The Use of Parallel and Comparable Data for Analysis of Abstract Anaphora in German and English |
Bibtex |
@InProceedings{DIPPER12.172,
author = {Stefanie Dipper and Melanie Seiss and Heike Zinsmeister}, title = {The Use of Parallel and Comparable Data for Analysis of Abstract Anaphora in German and English}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |