LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title Hua Yu: A Word-segmented and Part-Of-Speech Tagged Chinese Corpus
Authors Maosong Sun (The State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, P. R. China )
Honglin Sun (Language Information Processing Institute Beijing Language and Culture University, Beijing 100084, P. R.China )
Changning Huang (The State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, P. R. China )
Pu Zhang (Language Information Processing Institute Beijing Language and Culture University, Beijing 100084, P. R.China )
Hongbing Xing (Language Information Processing Institute Beijing Language and Culture University, Beijing 100084, P. R.China )
Qiang Zhou (The State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, P. R. China )
Keywords Annotated Corpus, Chinese Information Processing, Tag Set for Chinese, Word Segmentation and Part-of-Speech Tagging
Session Session WP5 - Corpus Tagging
Full Paper 372.ps, 372.pdf
Abstract As the outcome of a 3-year joint effort of Department of Computer Science, Tsinghua University and Language Information Processing Institute, Beijing Language and Culture University, Beijing, China, a word-segmented and part-of-speech tagged Chinese corpus with size of 2 million Chinese characters, named HuaYu, has been established. This paper firstly introduces some basics about HuaYu in brief, as its genre distribution, fundamental considerations in designing it, word segmentation and part-of-speech tagging standards. Then the complete list of tag set used in HuaYu is given, along with typical examples for each tag accordingly. Several pieces of annotated texts in each genre are also included at last for reader's reference.