SUMMARY : Session O23-SG Speech Corpora & Annotation

 

Title Linguistic Resources for Speech Parsing
Authors A. Bies, S. Strassel, H. Lee, K. Maeda, S. Kulick, Y. Liu, M. Harper, M. Lease
Abstract We report on the success of a two-pass approach to annotating metadata, speech effects and syntactic structure in English conversational speech: separately annotating transcribed speech for structural metadata, or structural events, (fillers, speech repairs ( or edit dysfluencies) and SUs, or syntactic/semantic units) and for syntactic structure (treebanking constituent structure and shallow argument structure). The two annotations were then combined into a single representation. Certain alignment issues between the two types of annotation led to the discovery and correction of annotation errors in each, resulting in a more accurate and useful resource. The development of this corpus was motivated by the need to have both metadata and syntactic structure annotated in order to support synergistic work on speech parsing and structural event detection. Automatic detection of these speech phenomena would simultaneously improve parsing accuracy and provide a mechanism for cleaning up transcriptions for downstream text processing. Similarly, constraints imposed by text processing systems such as parsers can be used to help improve identification of disfluencies and sentence boundaries. This paper reports on our efforts to develop a linguistic resource providing both spoken metadata and syntactic structure information, and describes the resulting corpus of English conversational speech.
Keywords Treebank, MDE, spoken metadata, annotation, speech parsing, corpus development, linguistic resources, conversational speech, disfluencies, structural event detection
Full paper Linguistic Resources for Speech Parsing