Title | Towards Intelligent Written Cultural Heritage Processing - Lexical Processing |
Author(s) |
Kiril Ribarov
Research asistant at the Center for Computational Linguistics, Charles University, Prague |
Session | P20-W |
Abstract | Through ACT (Annotated Corpora of Text) software package for lexical and corpus processing of European written cultural sources (currently used for processing of mediaeval Slavonic manuscripts) this work presents another step forward towards a contextual and intelligent heritage Information Technology framework. ACT is suitable for capturing characteristics of old written sources including rich language variability on word and sentential level. It is not the word-form, but its "understandings" that become central processing units, which can be assigned morphology distinctions, head-words (including recensional), translation equivalents, multi-word units, and correlation to other sources. The whole annotation process is automated, and individual sorting orders and morphology tags structures can be defined. ACT incorporates modules for: complex searches on one or more sources, creation of various ready-to-use documents, web text and image access, incorporation of lexical card-files into a corpus, and text-from-card-files reconstruction. |
Keyword(s) | Old-Church Slavonic, language resources, annotation, card-files |
Language(s) | Old-Church Slavonic |
Full Paper | 712.pdf |