Title |
Hybrid Constituent and Dependency Parsing with Tsinghua Chinese Treebank |
Authors |
Rui Wang and Yi Zhang |
Abstract |
In this paper, we describe our hybrid parsing model on the Mandarin Chinese processing. In particular, we work on the Tsinghua Chinese Treebank (TCT), whose annotation has both constitutes and the head information of each constitute. The model we design combines the mainstream constitute parsing and dependency parsing. We present in detail 1) how to (partially) encode the head information into the constitute parsing, 2) how to encode constitute information into the dependency parsing, and 3) how to restore the head information using the dependency structure. For each of them, we take different strategies to deal with different cases. In an open shared task evaluation, we achieve an f1-score of 85.23% for the constitute parsing, 82.35% with partial head information, and 74.27% with complete head information. The error analysis shows the challenge of restoring multiple-headed constitutes and also some potentials to use the dependency structure to guide the constitute parsing, which will be our future work to explore. |
Topics |
Parsing, Evaluation methodologies |
Full paper |
Hybrid Constituent and Dependency Parsing with Tsinghua Chinese Treebank |
Slides |
- |
Bibtex |
@InProceedings{WANG10.844,
author = {Rui Wang and Yi Zhang}, title = {Hybrid Constituent and Dependency Parsing with Tsinghua Chinese Treebank}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |