Title |
DELOS: An Automatically Tagged Economic Corpus for Modern Greek |
Authors |
Katia Lida Kermanidis (Wire Communications Laboratory Department of Electrical and Computer Engineering University of Patras, 26500, Rio Greece) Nikos Fakotakis (Wire Communications Laboratory Department of Electrical and Computer Engineering University of Patras, 26500, Rio Greece) George Kokkinakis (Wire Communications Laboratory Department of Electrical and Computer Engineering University of Patras, 26500, Rio Greece) |
Session |
WO5: Syntactic Annotation |
Abstract |
Text corpora resources have become an essential tool for Natural Language Processing tasks over the past years. A wide range of applications like information retrieval, ontology and terminology extraction require a sufficiently large corpus but of restricted domain. Manual tagging of such a corpus is very costly, making automatic annotation by a set of linguistic tools a very challenging idea. DELOS, described in this paper, is a Modern Greek corpus of economic domain consisting of 5 million word tokens, which is automatically tagged for morphology and shallow syntactic relations. The annotating tools described are embodied in an integrated system and their application to the corpus is performed using the GATE text engineering platform. The system output is a textual database marked up with the annotation tagset in plain text as well as in XML format. |
Keywords |
Economic corpus, Modern greek, Automatic annotation, Morphological analysis, Phrase chunker |
Full Paper |