Title |
Inforex -- a web-based tool for text corpus management and semantic annotation |
Authors |
Michał Marcińczuk, Jan Kocoń and Bartosz Broda |
Abstract |
The aim of this paper is to present a system for semantic text annotation called Inforex. Inforex is a web-based system designed for managing and annotating text corpora on the semantic level including annotation of Named Entities (NE), anaphora, Word Sense Disambiguation (WSD) and relations between named entities. The system also supports manual text clean-up and automatic text pre-processing including text segmentation, morphosyntactic analysis and word selection for word sense annotation. Inforex can be accessed from any standard-compliant web browser supporting JavaScript. The user interface has a form of dynamic HTML pages using the AJAX technology. The server part of the system is written in PHP and the data is stored in MySQL database. The system make use of some external tools that are installed on the server or can be accessed via web services. The documents are stored in the database in the original format ― either plain text, XML or HTML. Tokenization and sentence segmentation is optional and is stored in a separate table. Tokens are stored as pairs of values representing indexes of first and last character of the tokens and sets of features representing the morpho-syntactic information. |
Topics |
Tools, systems, applications, Corpus (creation, annotation, etc.), Usability, user satisfaction |
Full paper |
Inforex -- a web-based tool for text corpus management and semantic annotation |
Bibtex |
@InProceedings{MARCICZUK12.446,
author = {Michał Marcińczuk and Jan Kocoń and Bartosz Broda}, title = {Inforex -- a web-based tool for text corpus management and semantic annotation}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |