Summary of the paper

Title Towards Indonesian Part-of-Speech Tagging: Corpus and Models
Authors Sihui Fu and Nankai Lin
Abstract As a member of the Malayo-Polynesian languages, Indonesian is spoken by a large population. However, language resources and processing tools for Indonesian are quite limited. Part-of-speech (POS) tagging aims to assign a particular POS to a word, concerning its distribution and function in the context, which can provide valuable information for most natural language processing tasks. This work introduces our work on designing an Indonesian part-of-speech (POS) tagset, including 29 tags, and constructing a large Indonesian POS corpus comprised of over 355,000 tokens. During the designing and annotation processes, we make judgments more from a typological perspective, following the specifications of Universal Dependencies, while not missing those language-specific phenomena. In addition, we try to utilize several state-of-the-art sequence labeling models, trained on the proposed corpus, to implement automatic POS tagging, and the experiment results are favorable, with the accuracies higher than 94%.
Topics Part-Of-Speech Tagging
Full paper Towards Indonesian Part-of-Speech Tagging: Corpus and Models
Bibtex @InProceedings{FU18.3,
  author = {Sihui Fu and Nankai Lin},
  title = {Towards Indonesian Part-of-Speech Tagging: Corpus and Models },
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Erhong Yang and Le Sun},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-29-0},
  language = {english}
Powered by ELDA © 2018 ELDA/ELRA