SUMMARY : Session O23-SG Speech Corpora & Annotation
Title | Linguistic Resources for Speech Parsing |
---|---|
Authors | A. Bies, S. Strassel, H. Lee, K. Maeda, S. Kulick, Y. Liu, M. Harper, M. Lease |
Abstract | We report on the success of a two-pass approach to annotating metadata, speech effects and syntactic structure in English conversational speech: separately annotating transcribed speech for structural metadata, or structural events, (fillers, speech repairs ( or edit dysfluencies) and SUs, or syntactic/semantic units) and for syntactic structure (treebanking constituent structure and shallow argument structure). The two annotations were then combined into a single representation. Certain alignment issues between the two types of annotation led to the discovery and correction of annotation errors in each, resulting in a more accurate and useful resource. The development of this corpus was motivated by the need to have both metadata and syntactic structure annotated in order to support synergistic work on speech parsing and structural event detection. Automatic detection of these speech phenomena would simultaneously improve parsing accuracy and provide a mechanism for cleaning up transcriptions for downstream text processing. Similarly, constraints imposed by text processing systems such as parsers can be used to help improve identification of disfluencies and sentence boundaries. This paper reports on our efforts to develop a linguistic resource providing both spoken metadata and syntactic structure information, and describes the resulting corpus of English conversational speech. |
Keywords | Treebank, MDE, spoken metadata, annotation, speech parsing, corpus development, linguistic resources, conversational speech, disfluencies, structural event detection |
Full paper | Linguistic Resources for Speech Parsing |