Title |
A Positional Tagset for Russian |
Authors |
Jirka Hana and Anna Feldman |
Abstract |
Fusional languages have rich inflection. As a consequence, tagsets capturing their morphological features are necessarily large. A natural way to make a tagset manageable is to use a structured system. In this paper, we present a positional tagset for describing morphological properties of Russian. The tagset was inspired by the Czech positional system (Hajic, 2004). We have used preliminary versions of this tagset in our previous work (e.g., Hana et al. (2004, 2006); Feldman (2006); Feldman and Hana (2010)). Here, we both systematize and extend these preliminary versions (by adding information about animacy, aspect and reflexivity); give a more detailed description of the tagset and provide comparison with the Czech system. Each tag of the tagset consists of 16 positions, each encoding one morphological feature (part-of-speech, detailed part-of-speech, gender, animacy, number, case, possessor's gender and number, person, reflexivity, tense, aspect, degree of comparison, negation, voice, variant). The tagset contains approximately 2,000 tags. |
Topics |
Part of speech tagging, Morphology, Corpus (creation, annotation, etc.) |
Full paper |
A Positional Tagset for Russian |
Slides |
- |
Bibtex |
@InProceedings{HANA10.807,
author = {Jirka Hana and Anna Feldman}, title = {A Positional Tagset for Russian}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |