Title |
Enhanced Japanese Electronic Dictionary Look-up |
Authors |
Timothy Baldwin (CSLI, Ventura Hall, Stanford University Stanford, CA 94305-4115 USA) Slaven Bilac (Department of Computer Science Tokyo Institute of Technology 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552 JAPAN) Ryo Okumura (Department of Computer Science Tokyo Institute of Technology 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552 JAPAN) Takenobu Tokunaga (Department of Computer Science Tokyo Institute of Technology 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552 JAPAN) Hozumi Tanaka (Department of Computer Science Tokyo Institute of Technology 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552 JAPAN) |
Session |
WP2: Lexicons |
Abstract |
This paper describes the process of data preparation and reading generation for an ongoing project aimed at improving the accessibility of unknown words for learners of foreign languages, focusing initially on Japanese. Rather then requiring absolute knowledge of the readings of words in the foreign language, we allow look-up of dictionary entries by readings which learners can predictably be expected to associate with them. We automatically extract an exhaustive set of phonemic readings for each grapheme segment and learn basic morpho-phonological rules governing compound word formation, associating a probability with each. Then we apply the naive Bayes model to generate a set of readings and give each a likeliness score based on previously extracted evidence and corpus frequencies. |
Keywords |
Dictionaries |
Full Paper |