This paper is a demonstration of a POS (Part-of-Speech) annotation tool created for Bhojpuri, a lesser resourced language. Bhojpuri is a popular Indian language and spoken by more than 33 million speakers (census 2001) in India. The digital platform the availability of a good POS tagger is an important requirement for language resource creation and the POS tagger discussed here is one of the initial experiments aiming at language resource creation for Bhojpuri. The tagger was created as part of dissertation work and is based on the BIS (Bureau of Indian Standards) annotation scheme. Tagger performs decently on other varieties of Bhojpuri as well because of the variety of corpus data collected from different sources. The average accuracy achieved by the tool, so far, is 88.6% for general domain.
@InProceedings{SINGH18.21, author = {Srishti Singh and Girish Nath Jha}, title = {Demo: Parts-of-Speech Tagger for Bhojpuri}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Girish Nath Jha and Kalika Bali and Sobha L and Atul
Kr. Ojha}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-09-2}, language = {english} }