Title |
Semantically Annotated Snapshot of the English Wikipedia |
Authors |
Jordi Atserias, Hugo Zaragoza, Massimiliano Ciaramita and Giuseppe Attardi |
Abstract |
This paper describes SW1, the first version of a semantically annotated snapshot of the English Wikipedia. In recent years Wikipedia has become a valuable resource for both the Natural Language Processing (NLP) community and the Information Retrieval (IR) community. Although NLP technology for processing Wikipedia already exists, not all researchers and developers have the computational resources to process such a volume of information. Moreover, the use of different versions of Wikipedia processed differently might make it difficult to compare results. The aim of this work is to provide easy access to syntactic and semantic annotations for researchers of both NLP and IR communities by building a reference corpus to homogenize experiments and make results comparable. These resources, a semantically annotated corpus and a entity containment derived graph, are licensed under the GNU Free Documentation License and available from http://www.yr-bcn.es/semanticWikipedia |
Language |
Single language |
Topics |
Corpus (creation, annotation, etc.), Information Extraction, Information Retrieval, Acquisition, Machine Learning |
Full paper |
Semantically Annotated Snapshot of the English Wikipedia |
Slides |
Semantically Annotated Snapshot of the English Wikipedia |
Bibtex |
@InProceedings{ATSERIAS08.581,
author = {Jordi Atserias, Hugo Zaragoza, Massimiliano Ciaramita and Giuseppe Attardi},
title = {Semantically Annotated Snapshot of the English Wikipedia},
booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
year = {2008},
month = {may},
date = {28-30},
address = {Marrakech, Morocco},
editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
publisher = {European Language Resources Association (ELRA)},
isbn = {2-9517408-4-0},
note = {http://www.lrec-conf.org/proceedings/lrec2008/},
language = {english}
} |