Title |
A Corpus of Spontaneous Multi-party Conversation in Bosnian Serbo-Croatian and British English |
Authors |
Emina Kurtic, Bill Wells, Guy J. Brown, Timothy Kempton and Ahmet Aker |
Abstract |
In this paper we present a corpus of audio and video recordings of spontaneous, face-to-face multi-party conversation in two languages. Freely available high quality recordings of mundane, non-institutional, multi-party talk are still sparse, and this corpus aims to contribute valuable data suitable for study of multiple aspects of spoken interaction. In particular, it constitutes a unique resource for spoken Bosnian Serbo-Croatian (BSC), an under-resourced language with no spoken resources available at present. The corpus consists of just over 3 hours of free conversation in each of the target languages, BSC and British English (BE). The audio recordings have been made on separate channels using head-set microphones, as well as using a microphone array, containing 8 omni-directional microphones. The data has been segmented and transcribed using segmentation notions and transcription conventions developed from those of the conversation analysis research tradition. Furthermore, the transcriptions have been automatically aligned with the audio at the word and phone level, using the method of forced alignment. In this paper we describe the procedures behind the corpus creation and present the main features of the corpus for the study of conversation. |
Topics |
Corpus (creation, annotation, etc.), Dialogue, Other |
Full paper |
A Corpus of Spontaneous Multi-party Conversation in Bosnian Serbo-Croatian and British English |
Bibtex |
@InProceedings{KURTIC12.513,
author = {Emina Kurtic and Bill Wells and Guy J. Brown and Timothy Kempton and Ahmet Aker}, title = {A Corpus of Spontaneous Multi-party Conversation in Bosnian Serbo-Croatian and British English}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |