BABEL WORKSHOP Granada May 1998

First International Conference on Language Resources and Evaluation (LREC)
Granada, May 28-30 1998

SPEECH DATABASE DEVELOPMENT FOR CENTRAL AND EASTERN EUROPEAN LANGUAGES
Organised by the BABEL Project, Copernicus No. 1304
Wednesday, May 27th, 14.30 - 19.00

See Instructions for authors.

This workshop, which is held in conjunction with the First International Conference on Language Resources and Evaluation in Granada, Spain, will be concerned with the design, production and transcription standards required for the construction of speech databases for languages of Central and Eastern Europe.

Speech databases have been produced for a number of the world's major languages, but most languages of Central and Eastern Europe have received little attention in international terms until recently, though they are of major importance for the future of European speech science. There are special issues which arise in the production of representative samples of these languages, and this workshop will attempt to address these issues. The BABEL project (funded by the European Union under the COPERNICUS programme, project #1304) has been working on these issues since 1995, and will soon complete a database of Bulgarian, Estonian, Hungarian, Polish and Romanian. The work of the project will be reported at the workshop, and aspects of the project will be the subject of practical demonstrations, but it is hoped that papers will be contributed by other interested researchers who are not associated with the project.

Information about BABEL can be read on its WWW pages
Information about the main conference can be read on it's WWW pages

PROGRAMME

Programme	Time	Author(s)	Title & link to abstract
Welcome	14:30	Peter Roach	Introduction
Paper 1	14:40	Arvo Eek, Einar Meister	Estonian speech in the BABEL multilanguage database: phonetic-phonological problems revealed in the text corpus
Paper 2	15:00	SlawomirKula	Telephone bandwidth speech database: creation, applications and experiences for polish language
Paper 3	15:20	Henk van den Heuvel, Valery Galounov, Herbert S. Tropf	The SPEECHDAT(E) project: Creating speech dtabases for eastern European languages
Open Forum 1	15:40		The nature of our data
Paper 4	16:00	Klara Vicsi, A. Vig, G. Gordos	Experience on the development of a language independent automatic segmentation and labeling system on the frame of the BABEL project
Paper 5	16:20	Simon Dobrisek, Jerneja Gros, France Mihelic, Nikola Pavesic	GOPOLIS: A Multi Speaker Solvenian Speech Database
Coffee Break	16:40
Paper 6	17:00	Toomas Altosaar, Matti Karjalainen, Martti Vainio, Einar Meister	Finnish and Estonian Speech Applications developed on an Object-Oriented Speech Processing and Database System
Open Forum 2	17:20		Labelling and annotation
Paper 7	17:45	Marian Boldea, Cosmin Munteanu, Alin Doroga	Design, Collection, and Annotation of a Romanian Speech Database
Paper 8	18:05	Tamas Varadi	On the Spoken Corpus of the Budapest Sociolinguistic Interview
Paper 9	18:25	Zdravko Kacic, Janez Kaiser	Development of Slovenian SpeechDat database
Open Forum 3	18:45		The Future
CLOSE	19:30

INSTRUCTIONS FOR AUTHORS

Details of the required format are available from the LREC web site.
The deadline for submission of the completed paper is now April 14th.
Submission should be via email or on floppy disk to the contact address below.
Papers should be submitted as a Microsoft Word for Windows file (or other formats by arrangement with S.C.Arnfield@rdg.ac.uk).

ORGANISING COMMITTEE

Peter Roach, University of Reading, UK (BABEL Project Coordinator)
Klara Vicsi, Technical University, Budapest
Lori Lamel, LIMSI, Paris

CONTACT PERSON

Peter Roach, Department of Linguistic Science, University of Reading,
Reading RG6 6AA, UK.
Tel: (+44) 118 931 8138 Fax: (+44) 118 9753365
email: p.j.roach@reading.ac.uk

WORKSHOP TOPICS

We hope that the following topics can be considered in the workshop; this list is not exclusive, however.

Recording techniques and standards
Available software tools
Annotation, transcription and labelling
Automated time-alignment of labels
Phonetic problems of specific languages of Central and Eastern Europe
Quality control
Requirements for larger-scale databases
Dissemination of data; recording further languages; possibilities for future collaboration.

THE WORKSHOP WILL CONCLUDE WITH A DISCUSSION OF THE POSSIBILITY OF FORMING AN INFORMAL ASSOCIATION OF RESEARCHERS SPECIALISING IN THE SPOKEN FORMS OF CENTRAL AND EASTERN EUROPEAN LANGUAGES.

Project Co-ordinator: Professor Peter Roach(P.J.Roach@reading.ac.uk)

Dr. Simon Arnfield(S.C.Arnfield@reading.ac.uk)

ML>