LREC 2000 2nd International Conference on Language Resources & Evaluation | ||||||
Title | Automatic Style Categorisation of Corpora in the Greek Language |
Authors | Tambouratzis George (Institute for Language and Speech Processing, Epidavrou & Artemidos 6, 151 25 Maroussi, Greece, giorg_t@ilsp.gr) Markantonatou Stella (Institute for Language and Speech Processing, Epidavrou & Artemidos 6, 151 25 Maroussi, Greece, marks@ilsp.gr) Hairetakis Nikolaos (Institute for Language and Speech Processing, Epidavrou & Artemidos 6, 151 25 Maroussi, Greece, nhaire@ilsp.gr) Carayannis George (Institute for Language and Speech Processing, Epidavrou & Artemidos 6, 151 25 Maroussi, Greece, gcara@ilsp.gr) |
Keywords | Automated Style Categorisation, Grammatical Rules, Greek Language, Masking-and-Matching Technique, Morphological Processing |
Session | Session WO3 - Corpus Categorisation |
Full Paper | 301.ps, 301.pdf |
Abstract | In this article, a system is proposed for the automatic style categorisation of text corpora in the Greek language. This categorisation is based to a large extent on the type of language used in the text, for example whether the language used is representative of formal Greek or not. To arrive to this categorisation, the highly inflectional nature of the Greek language is exploited. For each text, a vector of both structural and morphological characteristics is assembled. Categorisation is achieved by comparing this vector to given archetypes using a statistical-based method. Experimental resu |