LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title A Parallel Corpus of Italian/German Legal Texts
Authors Gamper Johann (European Academy Bolzano, Scientific Area “Language and Law”, Weggensteinstr. 12a, 39100 Bozen, Italy, jgamper@eurac.edu)
Keywords CES, Corpus Encoding, Parallel Corpus
Session Session WP3 - Multilingual Corpora
Full Paper 140.ps, 140.pdf
Abstract This paper presents the creation of a parallel corpus of Italian and German legal documents which are translations of one another. The corpus, which contains approximately 5 mio. words, is primarily intended as a resource for (semi-)automatic terminology acquisition. The guidelines of the Corpus Encoding Standard have been applied for encoding structural information, segmentation information, and sentence alignment. Since the parallel texts have a one-to-one correspondence on the sentence level, building a perfect sentence alignment is rather straightforward. As a result of this the corpus constitutes also a valuable testbed for the evaluation of alignment algorithms. The paper discusses the intended use of the corpus, the various phases of corpus compilation, and basic statistics.