Title |
Joint Grammar and Treebank Development for Mandarin Chinese with HPSG |
Authors |
Yi Zhang, Rui Wang and Yu Chen |
Abstract |
We present the ongoing development of MCG, a linguistically deep and precise grammar for Mandarin Chinese together with its accompanying treebank, both based on the linguistic framework of HPSG, and using MRS as the semantic representation. We highlight some key features of our grammar design, and review a number of challenging phenomena, with comparisons to alternative linguistic treatments and implementations. One of the distinguishing characteristics of our approach is the tight integration of grammar and treebank development. The two-step treebank annotation procedure benefits from the efficiency of the discriminant-based annotation approach, while giving the annotators full freedom of producing extra-grammatical structures. This not only allows the creation of a precise and full-coverage treebank with an imperfect grammar, but also provides prompt feedback for grammarians to identify the errors in the grammar design and implementation. Preliminary evaluation and error analysis shows that the grammar already covers most of the core phenomena for Mandarin Chinese, and the treebank annotation procedure reaches a stable speed of 35 sentences per hour with satisfying quality. |
Topics |
Grammar and Syntax, Corpus (creation, annotation, etc.), Profiling |
Full paper |
Joint Grammar and Treebank Development for Mandarin Chinese with HPSG |
Bibtex |
@InProceedings{ZHANG12.345,
author = {Yi Zhang and Rui Wang and Yu Chen}, title = {Joint Grammar and Treebank Development for Mandarin Chinese with HPSG}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |