LREC 2000 2nd International Conference on Language Resources & Evaluation | ||||||
Title | Some Technical Aspects about Aligning Near Languages |
Authors | de Yzaguirre Lluís (Institute for Applied Linguistic. Universitat Pompeu Fabra, La Rambla, 30-32. 08002, Barcelona, Spain, de_yza@upf.es) Ribas Marta (Institute for Applied Linguistic. Universitat Pompeu Fabra, La Rambla, 30-32. 08002, Barcelona, Spain ) Vivaldi Jordi (Institute for Applied Linguistics, Universitat Pompeu Fabra, Rambla Santa Mònica, 30, 08002 Barcelona, Spain, jorge.vivaldi@info.upf.es) Cabré M. Teresa (Institute for Applied Linguistics, Universitat Pompeu Fabra, Rambla Santa Mònica, 30, 08002 Barcelona, Spain, teresa.cabre@trad.upf.es) |
Keywords | Lemma and Part-of-Speech Based Aligment, Sentence Aligment |
Session | Session WP3 - Multilingual Corpora |
Full Paper | 186.ps, 186.pdf |
Abstract | IULA at UPF has developed an aligner that benefits from corpus processing results to produce an accurate and robust alignment, even with noisy parallel corpora. It compares lemmata and part-of-speech tags of analysed texts but it has two main characteristics. First, apparently it only works for near languages and second it requires morphological taggers for the compared languages. These two characteristics prevent this technique from being used for any pair of languages. Whevener it its applicable, a high quality of results is achieved. |