SUMMARY : Session P6-WT
Title | Constructing A Chinese Chat Language Corpus with A Two-Stage Incremental Annotation Approach |
---|---|
Authors | Y. Xia, K. Wong., Wen. Li |
Abstract | Chat language refers to the special human language widely used in the community of digital network chat. As chat language holds anomalous characteristics in forming words, phrases, and non-alphabetical characters, conventional natural language processing tools are ineffective to handle chat language text. Previous research shows that knowledge based methods perform less effectively in proc-essing unseen chat terms. This motivates us to construct a chat language corpus so that corpus-based techniques of chat language text processing can be developed and evaluated. However, creating the corpus merely by hand is difficult. One, this work is manpower consuming. Second, annotation inconsistency is serious. To minimize manpower and annotation inconsistency, a two-stage incre-mental annotation approach is proposed in this paper in constructing a chat language corpus. Experiments conducted in this paper show that the performance of corpus annotation can be improved greatly with this approach. |
Keywords | chat language, corpus annotation, natural language processing |
Full paper | Constructing A Chinese Chat Language Corpus with A Two-Stage Incremental Annotation Approach |