Title |
Parsing Chinese Synthetic Words with a Character-based Dependency Model |
Authors |
Fei Cheng, Kevin Duh and Yuji Matsumoto |
Abstract |
Synthetic word analysis is a potentially important but relatively unexplored problem in Chinese natural language processing. Two issues with the conventional pipeline methods involving word segmentation are (1) the lack of a common segmentation standard and (2) the poor segmentation performance on OOV words. These issues may be circumvented if we adopt the view of character-based parsing, providing both internal structures to synthetic words and global structure to sentences in a seamless fashion. However, the accuracy of synthetic word parsing is not yet satisfactory, due to the lack of research. In view of this, we propose and present experiments on several synthetic word parsers. Additionally, we demonstrate the usefulness of incorporating large unlabelled corpora and a dictionary for this task. Our parsers significantly outperform the baseline (a pipeline method). |
Topics |
Corpus (Creation, Annotation, etc.), Information Extraction, Information Retrieval |
Full paper |
Parsing Chinese Synthetic Words with a Character-based Dependency Model |
Bibtex |
@InProceedings{CHENG14.96,
author = {Fei Cheng and Kevin Duh and Yuji Matsumoto}, title = {Parsing Chinese Synthetic Words with a Character-based Dependency Model}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} } |