Title |
Alignment-based Profiling of Europarl Data in an English-Swedish Parallel Corpus |
Authors |
Lars Ahrenberg |
Abstract |
This paper profiles the Europarl part of an English-Swedish parallel corpus and compares it with three other subcorpora of the same parallel corpus. We first describe our method for comparison which is based on manually reviewed word alignments. We investigate relative frequences of different types of correspondence, including null alignments, many-to-one correspondences and crossings. In addition, both halves of the parallel corpus have been annotated with morpho-syntactic information. The syntactic annotation uses labelled dependency relations. Thus, we can see how different types of correspondences are distributed on different parts-of-speech and compute correspondences at the structural level. In spite of the fact that two of the other subcorpora contains fiction, it is found that the Europarl part is the one having the highest proportion of many types of restructurings, including additions, deletions, long distance reorderings and dependency reversals. We explain this by the fact that the majority of Europarl segments are parallel translations rather than source texts and their translations. |
Topics |
Corpus (creation, annotation, etc.), Machine Translation, SpeechToSpeech Translation, Profiling |
Full paper |
Alignment-based Profiling of Europarl Data in an English-Swedish Parallel Corpus |
Slides |
- |
Bibtex |
@InProceedings{AHRENBERG10.193,
author = {Lars Ahrenberg}, title = {Alignment-based Profiling of Europarl Data in an English-Swedish Parallel Corpus}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |