Title |
Retrieving Annotated Corpora for Corpus Annotation |
Author(s) |
Kyosuke Yoshida; Taiichi Hashimoto; Takenobu Tokunaga; Hozumi Tanaka Department of Computer Science, Tokyo Institute of Technology |
Session |
P19-SW |
Abstract |
This paper introduces a tool \Bonsai which supports human in annotating corpora with morphosyntactic information, and in retrieving syntactic structures stored in the database. Integrating annotation and retrieval enables users to annotate a new instance while looking back at the already annotated sentences which share the similar morphosyntactic structure. We focus on the retrieval part of the system, and describe a method to decompose a large input query into smaller ones in order to gain retrieval efficiency. The proposed method is evaluated with the Penn Treebank corpus, showing significant improvements. |
Keyword(s) |
Corpus annotation tool, Structure retrieval, XML, RDB |
Language(s) |
N/A |
Full Paper |