Title | Building Part-of-speech Corpora through Histogram Hopping |
Author(s) |
Marc Vilain
The MITRE Corporation |
Session | P14-W |
Abstract | This paper are concerned with lowering the cost of producing training resources for part-of-speech taggers. We focus primarily on the resource needs of unsupervised taggers, as these can be trained with simpler resources than their supervised counterparts. We introduce histogram hopping, a new approach for developing the central training resources of unsupervised taggers, and describe a simple annotation prototype that implements the approach. We then discuss the applicability of histogram hopping to the development of resources for supervised taggers. Finally, we report on a preliminary pilot study for French that validates this work. |
Keyword(s) | Part-of-speech, unsupervised learning, lexicon |
Language(s) | English, French |
Full Paper | 763.pdf |