Title

Retrieving Annotated Corpora for Corpus Annotation

Author(s)

Kyosuke Yoshida; Taiichi Hashimoto; Takenobu Tokunaga; Hozumi Tanaka

Department of Computer Science, Tokyo Institute of Technology

Session

P19-SW

Abstract

This paper introduces a tool \Bonsai which supports human in annotating corpora with morphosyntactic information, and in retrieving syntactic structures stored in the database. Integrating annotation and retrieval enables users to annotate a new instance while looking back at the already annotated sentences which share the similar morphosyntactic structure. We focus on the retrieval part of the system, and describe a method to decompose a large input query into smaller ones in order to gain retrieval efficiency. The proposed method is evaluated with the Penn Treebank corpus, showing significant improvements.

Keyword(s)

Corpus annotation tool, Structure retrieval, XML, RDB

Language(s)

N/A

Full Paper

403.pdf