Summary of the paper

Title Automatic Identification of Closely-related Indian Languages: Resources and Experiments
Authors Ritesh Kumar, Bornini Lahiri, Deepak Alok, Atul Kr. Ojha, Mayank Jain, Abdul Basit and Yogesh Dawar
Abstract In this paper, we discuss an attempt to develop an automatic language identification system for 5 closely-related Indo-Aryan languages of India-Awadhi, Bhojpuri, Braj, Hindi and Magahi. We have compiled a comparable corpora of varying length for these languages from various resources. We discuss the method of creation of these corpora in detail. Using these corpora, a language identification system was developed, which currently gives state-of-the-art accuracy of 96.48 %. We also used these corpora to study the similarity between the 5 languages at the lexical level, which is the first data-based study of the extent of ‘closeness’ of these languages.
Topics Closely-Related Language, Indo-Aryan, Language Identification
Full paper Automatic Identification of Closely-related Indian Languages: Resources and Experiments
Bibtex @InProceedings{KUMAR18.26,
  author = {Ritesh Kumar ,Bornini Lahiri ,Deepak Alok ,Atul Kr. Ojha ,Mayank Jain ,Abdul Basit and Yogesh Dawar},
  title = {Automatic Identification of Closely-related Indian Languages: Resources and Experiments},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Girish Nath Jha and Kalika Bali and Sobha L and Atul Kr. Ojha},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-09-2},
  language = {english}
  }
Powered by ELDA © 2018 ELDA/ELRA