LREC 2000 2nd International Conference on Language Resources & Evaluation | ||||||
Title | Corpora of Slovene Spoken Language for Multi-lingual Applications |
Authors | Gros Jerneja (Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, 1001 Ljubljana, Slovenia, nejka@fe.uni-lj.si) Mihelič France (Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, 1001 Ljubljana, Slovenia, mihelicf@fe.uni-lj.si) Dobrišek Simon (Faculty of Electrical Engineering, Univercity of Ljubljana, Laboratory of Artificial Perception, Tržaška 25, 1000 Ljubljana, Slovenia, simond@fe.uni-lj.si) Erjavec Tomaž (Dept. for Intelligent Systems, Jožef Stefan Institute, Ljubljana, Slovenia, tomaz.erjavecg@ijs.si) Žganec Mario (Masterpoint R&D, Baznikova 40, 1000 Ljubljana, Slovenia, Mario@masterpoint.si) |
Keywords | Annotation Tools, Continuous Speech, Diphone Inventory, Speech Corpus, Spoken Commands |
Session | Session SP3 - Spoken Language Resources' Projects |
Full Paper | 288.ps, 288.pdf |
Abstract | The domain of spoken language technologies ranges from speech input and output systems to complex understanding and generation systems, including multi- modal systems of widely differing complexity (such as automatic dictation machines) and multilingual systems (for example automatic dialogue and translation systems). The definition of standards and evaluation methodologies for such systems involves the specification and development of highly specific spoken language corpus and lexicon resources, and measurement and evaluation tools (EAGLES Handbook 1997). This paper presents the MobiLuz spoken resources of the Slovene language, which will be made freely available for research purposes in speech technology and linguistics. |