Summary of the paper

Title Grouping conversational markers across languages by exploiting large comparable corpora and unsupervised segmentation
Authors Laurent Prévot, Matthieu Stali and Shu-Chuan Tseng
Abstract This work approaches Conversational and Discourse Markers (hereafter DM) from a radical data-driven perspective grounded in large comparable corpora of French, English and Taiwan Mandarin conversations. The key features of our approach are (i) to account for lexicalization as a by-product of unsupervised segmentation applied to our corpora, (ii) to exploit simple metrics for clustering DM (both within a language and within multilingual clusters). We explore the benefits and the drawbacks of such a radical approach to DM. In particular we compare the DM clusters obtained from traditional segmentation into tokens (as given by manual transcription of the corpora) vs. unsupervised segmentation. The metrics on wich we ground the clustering experiments are based on contrast between (i) short vs. longer utterances distribution, (ii) position within longer utterances.
Full paper Grouping conversational markers across languages by exploiting large comparable corpora and unsupervised segmentation
Bibtex @InProceedings{PRÉVOT18.4,
  author = {Laurent Prévot ,Matthieu Stali and Shu-Chuan Tseng},
  title = {Grouping conversational markers across languages by exploiting large comparable corpora and unsupervised segmentation},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Reinhard Rapp and Pierre Zweigenbaum and Serge Sharoff},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-07-8},
  language = {english}
  }
Powered by ELDA © 2018 ELDA/ELRA