LREC 2000 2nd International Conference on Language Resources & Evaluation | ||||||
Title | Automatic Extraction of English-Chinese Term Lexicons from Noisy Bilingual Corpora |
Authors | Le Sun (Open Systems & Chinese Information Processing Center, Institute of Software, Chinese Academy of Sciences, Beijing 100080, P. R. China., lesun@sonata.iscas.ac.cn) Youbing Jin (Open Systems & Chinese Information Processing Center, Institute of Software, Chinese Academy of Sciences, Beijing 100080, P. R. China., ybjin@sonata.iscas.ac.cn) Lin Du (Open Systems & Chinese Information Processing Center, Institute of Software, Chinese Academy of Sciences, Beijing 100080, P. R. China., ldu@sonata.iscas.ac.cn) Yufang Sun (Open Systems & Chinese Information Processing Center, Institute of Software, Chinese Academy of Sciences, Beijing 100080, P. R. China., yfsun@sonata.iscas.ac.cn) |
Keywords | Bilingual Corpora Processing, Sentence Alignment, Term Extraction |
Session | Session WO11 - Mono-Multilingual Lexicon Acquisition and Building |
Full Paper | 208.ps, 208.pdf |
Abstract | This paper describes our system, which is designed to extract English-Chinese term lexicons from noisy complex bilingual corpora and use them as translation lexicon to check sentence alignment results. The noisy bilingual corpora are aligned firstly by our improved length based statistical approach, which could detect sentence omission and insertion partly. A term extraction system is used to obtain term translation lexicons form roughly aligned corpora. Then the statistical approach is used to align the corpora again. Finally, we filter the noisy bilingual texts and obtain nearly perfect alignment corpora. |