Summary of the paper

Title Design and Preliminary Analysis of the Corpus of Everyday Japanese Conversation
Authors Hanae Koiso, Yasuyuki Usuda, Haruka Amatani, Yoshiko Kawabata and Yasuharu Den
Abstract Conversations emerge in various ways in everyday life. To capture the diversity of real-life conversations, we started the compilation of a large-scale corpus of everyday Japanese conversation, the Corpus of Everyday Japanese Conversation, CEJC. The CEJC is designed to contain various kinds of everyday conversations in a balanced manner so as to capture the diversity of everyday conversations and to observe natural conversational behavior. The CEJC targets conversations embedded in naturally occurring activities in daily life, without the exogenous intervention by researchers imposing topics or displacing the context of action. Since the start of the project in 2016, we have compiled 94 hours of conversations in the CEJC, corresponding to about a half of the target size of the entire corpus, and have morphologically annotated 38 hours of data. In this paper, we first outline the design of the CEJC including corpus size, recording methods, and annotations to be included in the corpus. Then, we conduct a preliminary analysis on some linguistic aspects of the corpus, based on the morphologically annotated data, showing that the CEJC captures the diversity of real-life conversations.
Topics Corpus Of Everyday Japanese Conversation, Corpus Analysis, Morphological Annotation, Corpus Design, Linguistic Aspects
Full paper Design and Preliminary Analysis of the Corpus of Everyday Japanese Conversation
Bibtex @InProceedings{KOISO18.5,
  author = {Hanae Koiso ,Yasuyuki Usuda ,Haruka Amatani ,Yoshiko Kawabata and Yasuharu Den},
  title = {Design and Preliminary Analysis of the Corpus of Everyday Japanese Conversation},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Hanae Koiso and Patrizia Paggio},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-16-0},
  language = {english}
  }
Powered by ELDA © 2018 ELDA/ELRA