Title |
From Grammar Rule Extraction to Treebanking: A Bootstrapping Approach |
Authors |
Masood Ghayoomi |
Abstract |
Most of the reliable language resources are developed via human supervision. Developing supervised annotated data is hard and tedious, and it will be very time consuming when it is done totally manually; as a result, various types of annotated data, including treebanks, are not available for many languages. Considering that a portion of the language is regular, we can define regular expressions as grammar rules to recognize the strings which match the regular expressions, and reduce the human effort to annotate further unseen data. In this paper, we propose an incremental bootstrapping approach via extracting grammar rules when no treebank is available in the first step. Since Persian suffers from lack of available data sources, we have applied our method to develop a treebank for this language. Our experiment shows that this approach significantly decreases the amount of manual effort in the annotation process while enlarging the treebank. |
Topics |
Grammar and Syntax, Corpus (creation, annotation, etc.), Parsing |
Full paper |
From Grammar Rule Extraction to Treebanking: A Bootstrapping Approach |
Bibtex |
@InProceedings{GHAYOOMI12.900,
author = {Masood Ghayoomi}, title = {From Grammar Rule Extraction to Treebanking: A Bootstrapping Approach}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |