Title |
Automatic Transliteration and Back-transliteration by Decision Tree Learning |
Authors |
Kang Byung-Ju (Department of Computer Science Advanced Information Technology Research Center (AITrc) Korea Terminology Center for Language and Knowledge Engineering Korea Advanced Institute of Science and Technology 373-1 Kusong-dong, Yusong-gu, Taejon, 305-701, Korea, bjkang@world.kaist.ac.kr) Choi Key-Sun (Korea Terminology Research Center for Language and Knowledge Engineering, Department of Computer Science, Korea Advanced Institute of Science and Technology, 373-1 Kusong-dong Yusong-gu Taejon 305-701 Korea, kschoi@korterm.kaist.ac.kr) |
Keywords |
|
Session |
Session WP6 - Tools in the Written Area |
Full Paper |
227.ps, 227.pdf |
Abstract |
Automatic transliteration and back-transliteration across languages with drastically different alphabets and phonemes inventories such as English/Korean, English/Japanese, English/Arabic, English/Chinese, etc, have practical importance in machine translation, cross-lingual information retrieval, and automatic bilingual dictionary compilation, etc. In this paper, a bi-directional and to some extent language independent methodology for English/Korean transliteration and back-transliteration is described. Our method is composed of character alignment and decision tree learning. We induce transliteration rules for each English alphabet and back-transliteration rules for each Korean alphabet. For the training of decision trees we need a large labeled examples of transliteration and back-transliteration. However this kind of resources are generally not available. Our character alignment algorithm is capable of highly accurately aligning English word and Korean transliteration in a desired way. |