BABEL EU Logo

ELRA



First International Conference on Language Resources and Evaluation (LREC)
Granada, May 28-30 1998


SPEECH DATABASE DEVELOPMENT FOR CENTRAL AND EASTERN EUROPEAN LANGUAGES
Organised by the BABEL Project, Copernicus No. 1304

Wednesday, May 27th, 14.30 - 19.00

See Instructions for authors.

This workshop, which is held in conjunction with the First International Conference on Language Resources and Evaluation in Granada, Spain, will be concerned with the design, production and transcription standards required for the construction of speech databases for languages of Central and Eastern Europe.

Speech databases have been produced for a number of the world's major languages, but most languages of Central and Eastern Europe have received little attention in international terms until recently, though they are of major importance for the future of European speech science. There are special issues which arise in the production of representative samples of these languages, and this workshop will attempt to address these issues. The BABEL project (funded by the European Union under the COPERNICUS programme, project #1304) has been working on these issues since 1995, and will soon complete a database of Bulgarian, Estonian, Hungarian, Polish and Romanian. The work of the project will be reported at the workshop, and aspects of the project will be the subject of practical demonstrations, but it is hoped that papers will be contributed by other interested researchers who are not associated with the project.

Information about BABEL can be read on its WWW pages
Information about the main conference can be read on it's WWW pages

PROGRAMME

Programme  Time  Author(s) Title & link to abstract
Welcome 14:30 Peter Roach Introduction
Paper 1 14:40 Arvo Eek, Einar Meister Estonian speech in the BABEL multilanguage database: phonetic-phonological problems revealed in the text corpus
Paper 2 15:00 SlawomirKula Telephone bandwidth speech database: creation, applications and experiences for polish language
Paper 3 15:20 Henk van den Heuvel, Valery Galounov, Herbert S. Tropf The SPEECHDAT(E) project: Creating speech dtabases for eastern European languages
Open Forum 1 15:40   The nature of our data
Paper 4 16:00 Klara Vicsi, A. Vig, G. Gordos Experience on the development of a language independent automatic segmentation and labeling system on the frame of the BABEL project
Paper 5 16:20 Simon Dobrisek, Jerneja Gros, France Mihelic, Nikola Pavesic GOPOLIS: A Multi Speaker Solvenian Speech Database
Coffee Break 16:40    
Paper 6 17:00 Toomas Altosaar, Matti Karjalainen, Martti Vainio, Einar Meister Finnish and Estonian Speech Applications developed on an Object-Oriented Speech Processing and Database System
Open Forum 2 17:20   Labelling and annotation
Paper 7 17:45 Marian Boldea, Cosmin Munteanu, Alin Doroga Design, Collection, and Annotation of a Romanian Speech Database
Paper 8 18:05 Tamas Varadi On the Spoken Corpus of the Budapest Sociolinguistic Interview
Paper 9 18:25 Zdravko Kacic, Janez Kaiser Development of Slovenian SpeechDat database
Open Forum 3 18:45   The Future
CLOSE 19:30    

INSTRUCTIONS FOR AUTHORS

  1. Details of the required format are available from the LREC web site.
  2. The deadline for submission of the completed paper is now April 14th.
  3. Submission should be via email or on floppy disk to the contact address below.
  4. Papers should be submitted as a Microsoft Word for Windows file (or other formats by arrangement with S.C.Arnfield@rdg.ac.uk).

ORGANISING COMMITTEE

CONTACT PERSON

Peter Roach, Department of Linguistic Science, University of Reading,
Reading RG6 6AA, UK.
Tel: (+44) 118 931 8138    Fax: (+44) 118 9753365
email: p.j.roach@reading.ac.uk

WORKSHOP TOPICS

We hope that the following topics can be considered in the workshop; this list is not exclusive, however.

  1. Recording techniques and standards
  2. Available software tools
  3. Annotation, transcription and labelling
  4. Automated time-alignment of labels
  5. Phonetic problems of specific languages of Central and Eastern Europe
  6. Quality control
  7. Requirements for larger-scale databases
  8. Dissemination of data; recording further languages; possibilities for future collaboration.

THE WORKSHOP WILL CONCLUDE WITH A DISCUSSION OF THE POSSIBILITY OF FORMING AN INFORMAL ASSOCIATION OF RESEARCHERS SPECIALISING IN THE SPOKEN FORMS OF CENTRAL AND EASTERN EUROPEAN LANGUAGES.


Project Co-ordinator: Professor Peter Roach(P.J.Roach@reading.ac.uk)

Dr. Simon Arnfield(S.C.Arnfield@reading.ac.uk)
ML>