LREC 2000 - Papers

LREC 2000 2^nd International Conference on Language Resources & Evaluation

Conference Papers

Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts.

Previous Paper Next Paper

Title Hua Yu: A Word-segmented and Part-Of-Speech Tagged Chinese Corpus

Authors Maosong Sun (The State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, P. R. China )
Honglin Sun (Language Information Processing Institute Beijing Language and Culture University, Beijing 100084, P. R.China )
Changning Huang (The State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, P. R. China )
Pu Zhang (Language Information Processing Institute Beijing Language and Culture University, Beijing 100084, P. R.China )
Hongbing Xing (Language Information Processing Institute Beijing Language and Culture University, Beijing 100084, P. R.China )
Qiang Zhou (The State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, P. R. China )

Keywords Annotated Corpus, Chinese Information Processing, Tag Set for Chinese, Word Segmentation and Part-of-Speech Tagging

Session Session WP5 - Corpus Tagging

Abstract As the outcome of a 3-year joint effort of Department of Computer Science, Tsinghua University and Language Information Processing Institute, Beijing Language and Culture University, Beijing, China, a word-segmented and part-of-speech tagged Chinese corpus with size of 2 million Chinese characters, named HuaYu, has been established. This paper firstly introduces some basics about HuaYu in brief, as its genre distribution, fundamental considerations in designing it, word segmentation and part-of-speech tagging standards. Then the complete list of tag set used in HuaYu is given, along with typical examples for each tag accordingly. Several pieces of annotated texts in each genre are also included at last for reader's reference.

ce="Verdana">

Title	Hua Yu: A Word-segmented and Part-Of-Speech Tagged Chinese Corpus
Authors	Maosong Sun (The State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, P. R. China ) Honglin Sun (Language Information Processing Institute Beijing Language and Culture University, Beijing 100084, P. R.China ) Changning Huang (The State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, P. R. China ) Pu Zhang (Language Information Processing Institute Beijing Language and Culture University, Beijing 100084, P. R.China ) Hongbing Xing (Language Information Processing Institute Beijing Language and Culture University, Beijing 100084, P. R.China ) Qiang Zhou (The State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, P. R. China )
Keywords	Annotated Corpus, Chinese Information Processing, Tag Set for Chinese, Word Segmentation and Part-of-Speech Tagging
Session	Session WP5 - Corpus Tagging
Abstract	As the outcome of a 3-year joint effort of Department of Computer Science, Tsinghua University and Language Information Processing Institute, Beijing Language and Culture University, Beijing, China, a word-segmented and part-of-speech tagged Chinese corpus with size of 2 million Chinese characters, named HuaYu, has been established. This paper firstly introduces some basics about HuaYu in brief, as its genre distribution, fundamental considerations in designing it, word segmentation and part-of-speech tagging standards. Then the complete list of tag set used in HuaYu is given, along with typical examples for each tag accordingly. Several pieces of annotated texts in each genre are also included at last for reader's reference.