LREC 2000 - Papers

LREC 2000 2^nd International Conference on Language Resources & Evaluation

Conference Papers

Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts.

Previous Paper Next Paper

Title Using Machine Learning Methods to Improve Quality of Tagged Corpora and Learning Models

Authors Matsumoto Yuji (Graduate School of Information Science, Nara Institute Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0101, Japan, matsu@is.aist-nara.ac.jp)
Yamashita Tatsuo (Graduate School of Information Science, Nara Institute Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0101, Japan, tatuo-yg@is.aist-nara.ac.jp)

Keywords

Session Session WO1 - Corpus Tagging

Abstract Corpus-based learning methods for natural language processing now provide a consistent way to achieve systems with good performance. A number of statistical learning models have been proposed and are used in most of the tasks which used to be handled by rule-based systems. When the learning systems come to such a level as competitive as manually constructed systems, both large scale training corpora and good learning models are of great importance. In this paper, we first discuss that the main hindrances to the improvement of corpus-based learning systems are the inconsistencies or the errors existing in the training corpus and the defectiveness in the learning model. We then show that some machine learning methods are useful for effective identification of the erroneous source in the training corpus. Finally, we discuss how the various types of errors should be coped with so as to improve the learning environments.

Title	Using Machine Learning Methods to Improve Quality of Tagged Corpora and Learning Models
Authors	Matsumoto Yuji (Graduate School of Information Science, Nara Institute Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0101, Japan, matsu@is.aist-nara.ac.jp) Yamashita Tatsuo (Graduate School of Information Science, Nara Institute Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0101, Japan, tatuo-yg@is.aist-nara.ac.jp)
Keywords
Session	Session WO1 - Corpus Tagging
Abstract	Corpus-based learning methods for natural language processing now provide a consistent way to achieve systems with good performance. A number of statistical learning models have been proposed and are used in most of the tasks which used to be handled by rule-based systems. When the learning systems come to such a level as competitive as manually constructed systems, both large scale training corpora and good learning models are of great importance. In this paper, we first discuss that the main hindrances to the improvement of corpus-based learning systems are the inconsistencies or the errors existing in the training corpus and the defectiveness in the learning model. We then show that some machine learning methods are useful for effective identification of the erroneous source in the training corpus. Finally, we discuss how the various types of errors should be coped with so as to improve the learning environments.