LREC 2000 2nd International Conference on Language Resources & Evaluation  
Home Basic Info Archaeological Zappeion Registration Conference

Conference Papers

Program
Papers
Sessions
Abstracts
Authors
Keywords
Search

Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts.


Previous Paper   Next Paper  

Title Enhancing Speech Corpus Resources with Multiple Lexical Tag Layers
Authors Witt Andreas (Fakultat fur Linguistik und Literaturwissenschaft, Universitat Bielefeld, witt@lili.uni-bielefeld.de, Postfach 10 01 31, 33501 Bielefeld, Germany)
Lungen Harald (Fakultat fur Linguistik und Literaturwissenschaft, Universitat Bielefeld, luengen@spectrum.uni-bielefeld.de, Postfach 10 01 31, 33501 Bielefeld, Germany)
Gibbon Dafydd (Fakultat fur Linguistik und Literaturwissenschaft, Universita+C122t Bielefeld, Postfach 100 131, D–33501 Bielefeld, Germany, gibbon@spectrum.uni-bielefeld.de)
Keywords DSSSL, Morphology, Speech Corpora, Speech Lexica, Text Technology, XML
Session Session SP2 - Spoken Language Resources Issues from Construction to Validation
Abstract We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transfor-mation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of computing resources, it can be employed as a parametrised ‘tagging on demand’ filter. The implementation will shortly be released as a public resource together with the corpus (German spoken dialogue, about 500k word form tokens) and lexicon (about 75k word form types).

 

="Verdana">