Title |
A generic formalism to represent linguistic corpora in RDF and OWL/DL |
Authors |
Christian Chiarcos |
Abstract |
This paper describes POWLA, a generic formalism to represent linguistic corpora by means of RDF and OWL/DL. Unlike earlier approaches in this direction, POWLA is not tied to a specific selection of annotation layers, but rather, it is designed to support any kind of text-oriented annotation. POWLA inherits its generic character from the underlying data model PAULA (Dipper, 2005; Chiarcos et al., 2009) that is based on early sketches of the ISO TC37/SC4 Linguistic Annotation Framework (Ide and Romary, 2004). As opposed to existing standoff XML linearizations for such generic data models, it uses RDF as representation formalism and OWL/DL for validation. The paper discusses advantages of this approach, in particular with respect to interoperability and queriability, which are illustrated for the MASC corpus, an open multi-layer corpus of American English (Ide et al., 2008). |
Topics |
Corpus (creation, annotation, etc.), LR Infrastructures and Architectures, Ontologies |
Full paper |
A generic formalism to represent linguistic corpora in RDF and OWL/DL |
Bibtex |
@InProceedings{CHIARCOS12.915,
author = {Christian Chiarcos}, title = {A generic formalism to represent linguistic corpora in RDF and OWL/DL}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |