LREC 2000 2nd International Conference on Language Resources & Evaluation | ||||||
Title | A Unified POS Tagging Architecture and its Application to Greek |
Authors | Papageorgiou Harris (Institute for Language and Speech Processing, Epidavrou & Artemidos 6, 151 25 Maroussi, Greece, xaris@ilsp.gr) Prokopidis Prokopis (Institute for Language and Speech Processing, Artemidos 6 & Epidavrou, 151 25 Maroussi, Greece, prokopis@ilsp.gr) Giouli Voula (Institute for Language and Speech Processing, Artemidos 6 & Epidavrou, 151 25, Athens, Greece, tel: +301 6875300, fax: +301 6854270, voula@ilsp.gr) Piperidis Stelios (Institute for Language and Speech Processing, Artemidos 6 & Epidavrou, 151 25, Athens, Greece, tel: +301 6875300, fax: +301 6854270, spip@ilsp.gr) |
Keywords | Greek, POS Tagging, Transformation Based Learning, XML |
Session | Session WO18 - Morphology in Lexical and Textual Resources |
Full Paper | 181.ps, 181.pdf |
Abstract | This paper proposes a flexible and unified tagging architecture that could be incorporated into a number of applications like information extraction, cross-language information retrieval, term extraction, or summarization, while providing an essential component for subsequent syntactic processing or lexicographical work. A feature-based multi-tiered approach (FBT tagger) is introduced to part-of-speech tagging. FBT is a variant of the well-known transformation based learning paradigm aiming at improving the quality of tagging highly inflective languages such as Greek. Additionally, a large experiment concerning the Greek language is conducted and results are presented for a variety of text genres, including financial reports, newswires, press releases and technical manuals. Finally, the adopted evaluation methodology is discussed. |