Title

Use of XML and Relational Databases for Consistent Development and Maintenance of Lexicons and Annotated Corpora

Authors

Masayuki Asahara  (Graduate School of Information Science, Nara Institute of Science and Technology, Japan 8916-5 Takayama, Ikoma, Nara, 630-0101, JAPAN)

Ryuichi Yoneda  (Graduate School of Information Science, Nara Institute of Science and Technology, Japan 8916-5 Takayama, Ikoma, Nara, 630-0101, JAPAN)

Akiko Yamashita (Graduate School of Information Science, Nara Institute of Science and Technology, Japan 8916-5 Takayama, Ikoma, Nara, 630-0101, JAPAN)

Yasuharu Den (Faculty of Letters, Chiba University1-33 Yayoicho, Inage-ku, Chiba 263-8522, JAPAN)

Yuji Matsumoto (Graduate School of Information Science, Nara Institute of Science and Technology, Japan 8916-5 Takayama, Ikoma, Nara, 630-0101, JAPAN)

Session

WO14: Lexicons

Abstract

In this paper, we present a use of XML and relational database for developing and maintaining Japanese linguistic resources. In languages that do not provide word delimitation in texts (e.g. Chinese and Japanese), consistent delimitation definition of words in a lexicon is a critical issue to build POS tagged corpora. When we change the definition of word delimitation in the lexicon, we need to modify the tagged corpora to make them consistent with the lexicon. We propose a use of relational database to perform these modifications in tandem. Hence, in the Japanese language, there are several standards for word delimitation definition. To accommodate more than one definition of word delimitation, we compose a compounding word lexicon in the database. The compounding word lexicon includes dependency structures of compounding words.

Keywords

Lexicons

Full Paper

191.pdf