Title |
Detecting Japanese Compound Functional Expressions using Canonical/Derivational Relation |
Authors |
Takafumi Suzuki, Yusuke Abe, Itsuki Toyota, Takehito Utsuro, Suguru Matsuyoshi and Masatoshi Tsuchiya |
Abstract |
The Japanese language has various types of functional expressions. In order to organize Japanese functional expressions with various surface forms, a lexicon of Japanese functional expressions with hierarchical organization was compiled. This paper proposes how to design the framework of identifying more than 16,000 functional expressions in Japanese texts by utilizing hierarchical organization of the lexicon. In our framework, more than 16,000 functional expressions are roughly divided into canonical / derived functional expressions. Each derived functional expression is intended to be identified by referring to the most similar occurrence of its canonical expression. In our framework, contextual occurrence information of much fewer canonical expressions are expanded into the whole forms of derived expressions, to be utilized when identifying those derived expressions. We also empirically show that the proposed method can correctly identify more than 80% of the functional / content usages only with less than 38,000 training instances of manually identified canonical expressions. |
Topics |
MultiWord Expressions & Collocations, Word Sense Disambiguation, Lexicon, lexical database |
Full paper |
Detecting Japanese Compound Functional Expressions using Canonical/Derivational Relation |
Bibtex |
@InProceedings{SUZUKI12.902,
author = {Takafumi Suzuki and Yusuke Abe and Itsuki Toyota and Takehito Utsuro and Suguru Matsuyoshi and Masatoshi Tsuchiya}, title = {Detecting Japanese Compound Functional Expressions using Canonical/Derivational Relation}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |