Summary of the paper

Title The Cologne Corpus of German Sign Language as L2 (C/CSL2): Current Development Stand
Authors Alejandro Oviedo, Thomas Kaul, Leonid Klinner and Reiner Griebel
Abstract Since 2016 (Kaul et al., 2016) a German Sign Language (DGS) learner corpus (Granger et al., 2015) it has been building up at the University of Cologne. Primary data consist of around 60 hours of signed discourse in more than 1,250 individual files produced by 350 DGS hearing learners (312 female / 38 male) whose mother tongue is German. Data has been collected from A1 to C1 CEFR (Council of Europe, 2001) proficiency levels. A similar number of monologues and dialogues is included. Monologues (average duration 2.5 minutes) are mostly induced by an illustration or a video. Dialogues have an average duration of 8 minutes. Dialogues corresponding to the levels A1 to B2 are performed between the informant and a Deaf teacher. At advanced level (C1) dialogues show an interaction between two students. Metadata related to the videos includes age and gender of the informants as well as the proficiency level and semester of data collection. A part of the data corresponds to a longitudinal learner corpus (Granger et al., 2015). This is the case of a group of students who visited DGS-courses of different proficiency levels between mid-2015 and the end of 2017 and were filmed at different times along that period. The corpus is a work in progress. Our primary data are constantly being extended, since each semester new videos are added to the corpus (the tests presented by the students in the DGS courses as well as a number of videos produced and analyzed by the students in linguistics courses). Only around 6% of the videos have received so far transcription: German glosses, translation into German and some linguistic tags have been included in ELAN (Crasborn & Slotjes, 2008) files. Lemmatisation (Johnston, 2010) has been oriented using a lexical database of around 8,000 signs previously produced by our university to serve as teaching material. Current transcriptions also include a series of annotation lines with controlled vocabulary for word-classes, disfluencies (Oviedo et al., in press) and deviations from the DGS standard at phonetic-phonological, morphological and syntactic levels. The biggest challenge faced so far in the development of our corpus is the reluctance of students to authorize the use of the corpus outside our research group. We are only authorized to transcribe the videos and use the transcriptions as a data source. However, a small group of students have up to now authorized us to show their videos and/or video-pictures to external audiences. One strategy that has proved to be useful in obtaining data that can be shared is that of linking students to the tasks of transcription and linguistic analysis. During the 2016/2017 winter semester we held a seminar with masters students to train them in the transcription of their own signed recordings. At the end of the course, the majority of the participants gave us permission to use their videos in public demonstrations. References Kaul, Th.; Oviedo, A.; Griebel, R.; Klinner, L.; Prüfer, T. & Krumpen, M. (2016). C/CSL2, The Cologne Corpus of Sign Language as a Second Language. Poster presented at TaLC 12, University of Giessen, held on 21th July 2016. Granger, S.; Guilquin, G. & Meunier, F. (Eds.) (2015). The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press. Council of Europe. 2001. Common European framework of reference for languages: Learning, teaching, assessment. Cambridge: Press Syndicate of the University of Cambridge. Crasborn, O. & Sloetjes, H.. 2008. Enhanced ELAN functionality for sign language corpora. In Onno Crasborn, Thomas Hanke, Eleni Efthimiou, Inge Zwitserlood & Ernst Thoutenhoofd (Eds.). Proceedings of LREC 2008, Sixth International Conference on Language Resources and Evaluation. Paris: ELDA, pp. 39-43. Johnston, T.. 2010. From archive to corpus: Transcription and annotation in the creation of signed language corpora. International Journal of Corpus Linguistics, 15(1), pp. 106-131. Oviedo, A.; Kaul, Th.; Urbann, K.; Griebel, R. & Klinner, L. (in press): Eine Annäherung zu den Pausen als Flüssigkeitsfaktoren in Deutscher Gebärdensprache als L1. Das Zeichen 32(108). –to appear in March 2018.
Full paper The Cologne Corpus of German Sign Language as L2 (C/CSL2): Current Development Stand
Bibtex @InProceedings{OVIEDO18.18021,
  author = {Alejandro Oviedo ,Thomas Kaul ,Leonid Klinner and Reiner Griebel},
  title = {The Cologne Corpus of German Sign Language as L2 (C/CSL2): Current Development Stand},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Mayumi Bono and Eleni Efthimiou and Stavroula-Evita Fotinea and Thomas Hanke and Julie Hochgesang and Jette Kristoffersen and Johanna Mesch and Yutaka Osugi},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-01-6},
  language = {english}
  }
Powered by ELDA © 2018 ELDA/ELRA