LREC 2000 2nd International Conference on Language Resources & Evaluation | |
Conference Papers
Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377. |
Previous Paper Next Paper
Title | Hua Yu: A Word-segmented and Part-Of-Speech Tagged Chinese Corpus |
Authors |
Maosong Sun (The State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, P. R. China ) Honglin Sun (Language Information Processing Institute Beijing Language and Culture University, Beijing 100084, P. R.China ) Changning Huang (The State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, P. R. China ) Pu Zhang (Language Information Processing Institute Beijing Language and Culture University, Beijing 100084, P. R.China ) Hongbing Xing (Language Information Processing Institute Beijing Language and Culture University, Beijing 100084, P. R.China ) Qiang Zhou (The State Key Laboratory of Intelligent Technology and Systems Tsinghua University, Beijing 100084, P. R. China ) |
Keywords | Annotated Corpus, Chinese Information Processing, Tag Set for Chinese, Word Segmentation and Part-of-Speech Tagging |
Session | Session WP5 - Corpus Tagging |
Abstract | As the outcome of a 3-year joint effort of Department of Computer Science, Tsinghua University and Language Information Processing Institute, Beijing Language and Culture University, Beijing, China, a word-segmented and part-of-speech tagged Chinese corpus with size of 2 million Chinese characters, named HuaYu, has been established. This paper firstly introduces some basics about HuaYu in brief, as its genre distribution, fundamental considerations in designing it, word segmentation and part-of-speech tagging standards. Then the complete list of tag set used in HuaYu is given, along with typical examples for each tag accordingly. Several pieces of annotated texts in each genre are also included at last for reader's reference. |