LREC 2014 Proceedings

Summary of the paper

Title	Constituency Parsing of Bulgarian: Word- vs Class-based Parsing
Authors	Masood Ghayoomi, Kiril Simov and Petya Osenova
Abstract	In this paper, we report the obtained results of two constituency parsers trained with BulTreeBank, an HPSG-based treebank for Bulgarian. To reduce the data sparsity problem, we propose using the Brown word clustering to do an off-line clustering and map the words in the treebank to create a class-based treebank. The observations show that when the classes outnumber the POS tags, the results are better. Since this approach adds on another dimension of abstraction (in comparison to the lemma), its coarse-grained representation can be used further for training statistical parsers.
Topics	Grammar and Syntax, Parsing
Full paper	Constituency Parsing of Bulgarian: Word- vs Class-based Parsing
Bibtex	@InProceedings{GHAYOOMI14.696, author = {Masood Ghayoomi and Kiril Simov and Petya Osenova}, title = {Constituency Parsing of Bulgarian: Word- vs Class-based Parsing}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} }