W11 2018 Proceedings

Summary of the paper

Title	Towards a Part-of-Speech Tagger for Awadhi: Corpus and Experiments
Authors	Abdul Basit and Ritesh Kumar
Abstract	Awadhi is an Indo-Aryan language, spoken in the eastern region of Uttar Pradesh by approximately 38 million native speakers. However, despite this large number of speakers, it is highly lacking in language resources like corpus, language technology tools, guidelines etc till date. This paper presents the first attempt towards developing an annotated corpora and a POS tagger of the language, The corpus is currently annotated with part-of-speech tags. Since there is no earlier tagset available for Awadhi, the POS tagset for the language was developed as part of this research. The tagset is a subset of the BIS scheme, which is the national standard for the development of POS tagsets for Indian languages.
Topics	Pos Annotation, Corpus Development, Bis
Full paper	Towards a Part-of-Speech Tagger for Awadhi: Corpus and Experiments
Bibtex	@InProceedings{BASIT18.18, author = {Abdul Basit and Ritesh Kumar}, title = {Towards a Part-of-Speech Tagger for Awadhi: Corpus and Experiments}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Girish Nath Jha and Kalika Bali and Sobha L and Atul Kr. Ojha}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-09-2}, language = {english} }