Title |
Fine-Grained Geographical Relation Extraction from Wikipedia |
Authors |
Andre Blessing and Hinrich Schütze |
Abstract |
In this paper, we present work on enhancing the basic data resource of a context-aware system. Electronic text offers a wealth of information about geospatial data and can be used to improve the completeness and accuracy of geospatial resources (e.g., gazetteers). First, we introduce a supervised approach to extracting geographical relations on a fine-grained level. Second, we present a novel way of using Wikipedia as a corpus based on self-annotation. A self-annotation is an automatically created high-quality annotation that can be used for training and evaluation. Wikipedia contains two types of different context: (i) unstructured text and (ii) structured data: templates (e.g., infoboxes about cities), lists and tables. We use the structured data to annotate the unstructured text. Finally, the extracted fine-grained relations are used to complete gazetteer data. The precision and recall scores of more than 97 percent confirm that a statistical IE pipeline can be used to improve the data quality of community-based resources. |
Topics |
Information Extraction, Information Retrieval, Acquisition, Corpus (creation, annotation, etc.) |
Full paper |
Fine-Grained Geographical Relation Extraction from Wikipedia |
Slides |
Fine-Grained Geographical Relation Extraction from Wikipedia |
Bibtex |
@InProceedings{BLESSING10.519,
author = {Andre Blessing and Hinrich Schütze}, title = {Fine-Grained Geographical Relation Extraction from Wikipedia}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |