LREC 2000 - Papers

LREC 2000 2^nd International Conference on Language Resources & Evaluation

Conference Papers

Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts.

Previous Paper Next Paper

Title Semi-automatic Construction of a Tree-annotated Corpus Using an Iterative Learning Statistical Language Model

Authors Shirai Kiyoaki (Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, kshirai@cl.cs.titech.ac.jp)
Tanaka Hozumi (Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, tanaka@cl.cs.titech.ac.jp)
Tokunaga Takenobu (Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, take@cl.cs.titech.ac.jp)

Keywords Human Intervention, Iterative Learning, Statistical Language Model, Tree-Annotated Coprpus

Session Session WP2 - Corpus Annotation

Abstract In this paper, we propose a method to construct a tree-annotated corpus, when a certain statistical parsing system exists and no tree-annotated corpus is available as training data. The basic idea of our method is to sequentially annotate plain text inputs with syntactic trees using a parser with a statistical language model, and iteratively retrain the statistical language model over the obtained annotated trees. The major characteristics of our method are as follows: (1)in the first step of the iterative learning process, we manually construct a tree-annotated corpus to initialize the statistical language model over, and (2) at each step of the parse tree annotation process, we use both syntactic statistics obtained from the iterative learning process and lexical statistics pre-derived from existing language resources, to choose the most probable parse tree.

rdana">

Title	Semi-automatic Construction of a Tree-annotated Corpus Using an Iterative Learning Statistical Language Model
Authors	Shirai Kiyoaki (Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, kshirai@cl.cs.titech.ac.jp) Tanaka Hozumi (Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, tanaka@cl.cs.titech.ac.jp) Tokunaga Takenobu (Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology, take@cl.cs.titech.ac.jp)
Keywords	Human Intervention, Iterative Learning, Statistical Language Model, Tree-Annotated Coprpus
Session	Session WP2 - Corpus Annotation
Abstract	In this paper, we propose a method to construct a tree-annotated corpus, when a certain statistical parsing system exists and no tree-annotated corpus is available as training data. The basic idea of our method is to sequentially annotate plain text inputs with syntactic trees using a parser with a statistical language model, and iteratively retrain the statistical language model over the obtained annotated trees. The major characteristics of our method are as follows: (1)in the first step of the iterative learning process, we manually construct a tree-annotated corpus to initialize the statistical language model over, and (2) at each step of the parse tree annotation process, we use both syntactic statistics obtained from the iterative learning process and lexical statistics pre-derived from existing language resources, to choose the most probable parse tree.