Summary of the paper

Title If You Even Don't Have a Bit of Bible: Learning Delexicalized POS Taggers
Authors Zhiwei Yu, David Mareček, Zdeněk Žabokrtský and Daniel Zeman
Abstract Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP. Various unsupervised and semi-supervised methods have been proposed to tag an unseen language. However, many of them require some partial understanding of the target language because they rely on dictionaries or parallel corpora such as the Bible. In this paper, we propose a different method named delexicalized tagging, for which we only need a raw corpus of the target language. We transfer tagging models trained on annotated corpora of one or more resource-rich languages. We employ language-independent features such as word length, frequency, neighborhood entropy, character classes (alphabetic vs. numeric vs. punctuation) etc. We demonstrate that such features can, to certain extent, serve as predictors of the part of speech, represented by the universal POS tag.
Topics Part-of-Speech Tagging, Endangered Languages, Corpus (Creation, Annotation, etc.)
Full paper If You Even Don't Have a Bit of Bible: Learning Delexicalized POS Taggers
Bibtex @InProceedings{YU16.709,
  author = {Zhiwei Yu and David Mareček and Zdeněk Žabokrtský and Daniel Zeman},
  title = {If You Even Don't Have a Bit of Bible: Learning Delexicalized POS Taggers},
  booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
  year = {2016},
  month = {may},
  date = {23-28},
  location = {Portorož, Slovenia},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Sara Goggi and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Helene Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {978-2-9517408-9-1},
  language = {english}
 }
Powered by ELDA © 2016 ELDA/ELRA