Title |
Querying Diverse Treebanks in a Uniform Way |
Authors |
Jan Štěpánek and Petr Pajas |
Abstract |
This paper presents a system for querying treebanks in a uniform way. The system is able to work with both dependency and constituency based treebanks in any language. We demonstrate its abilities on 11 different treebanks. The query language used by the system provides many features not available in other existing systems while still keeping the performance efficient. The paper also describes the conversion of ten treebanks into a common XML-based format used by the system, touching the question of standards and formats. The paper then shows several examples of linguistically interesting questions that the system is able to answer, for example browsing verbal clauses without subjects or extraposed relative clauses, generating the underlying grammar in a constituency treebank, searching for non-projective edges in a dependency treebank, or word-order typology of a language based on the treebank. The performance of several implementations of the system is also discussed by measuring the time requirements of some of the queries. |
Topics |
Tools, systems, applications, Corpus (creation, annotation, etc.), LR Infrastructures and Architectures |
Full paper |
Querying Diverse Treebanks in a Uniform Way |
Slides |
- |
Bibtex |
@InProceedings{TPNEK10.381,
author = {Jan Štěpánek and Petr Pajas}, title = {Querying Diverse Treebanks in a Uniform Way}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |