This workshop will be held in conjunction with the International Conference on Language Resources and Evaluation (LREC), Granada, Spain: May 28-30, 1998. The workshop will provide a forum for researchers working on the development of speech and language resources for the indigenous minority languages of Europe.
The minority or "lesser used" languages of Europe (e.g. Basque, Welsh, Breton) are under increasing pressure from the major languages. Some of them (e.g. Gaelic) are becoming endangered, but others (e.g. Catalan) are in a stronger position, with a certain amount of official recognition and funding. However, the situation with regard to language resources is fragmented and disorganised. Some minority languages have been adequately researched linguistically, but most have not, and the vast majority do not yet possess basic speech and language resources (such as text and speech corpora) which are sufficient to permit commercial development of products.
If this situation were to continue, the minority languages of Europe would fall a long way behind the major languages, as regards the availability of commercial speech and language products. This in turn will accelerate the decline of those languages that are already struggling to survive, as speakers are forced to use the majority language for interaction with these products. To break this vicious circle, it is important to encourage the development of basic language resources.
The workshop is a very small first step towards encouraging the development of such resources. The aim is to share information, so that isolated researchers will not need to start from nothing. An important aspect will be the forming of personal contacts, which at present do not exist. The aim is to make it easier for isolated researchers with little funding and no existing corpora to begin developing a usuable speech or text database. There will be a balance between presentations of existing language resources, and more general presentations designed to give background information.
Technical areas covered will include:
Papers are invited that will describe existing speech and language resources for minority languages (speech databases, text databases, and lexicons), also papers based on the analysis of these resources. Presentations will last 20 minutes each. All presentations will be given in English, since it cannot be assumed that each listener will speak all the minority languages discussed.
Briony Williams | University of Edinburgh, Scotland, UK |
Climent Nadeu | Universitat Politecnica de Catalunya, Catalunya, Spain |
Alex Monaghan | Dublin City University, Ireland |
Papers should not exceed 4000 words or 10 pages. They can be submitted in one of two ways: hard copy or electronic submission. They should be in A4 size and in English.
#NAME | Name of first author |
#TITLE | Title of the paper |
#PAGES | Number of pages |
#NOTE | Any relevant instructions about the format etc. |
#ABSTR | Abstract of the paper |
Email address of the first author | |
#ADDR | Postal address of the first author |
#TEL | Telephone number of the first author |
#FAX | Fax number of the first author |
Paper submission deadline | February 27 |
Paper notification | March 27 |
Camera-ready papers due | April 15 |
Workshop | May 27 |
General information about the main conference is at the web page http://www.icp.inpg.fr/ELRA/conflre.html
PROGRAMME "Language Resources for European Minority Languages" ---------------------------------------------------- Wednesday May 27 1998 (morning), Granada, Spain In association with the First International Conference on Language Resources and Evaluation, May 28-30 1998, Granada, Spain 8:00 Registration 8:30 Welcome and Introduction 8:40 "Overview of minority languages in Europe". Marc Alemany (Catalan Sociolinguistic Institute). 9:00 "VOCATEL and VOGATEL: Two Telephone Speech Databases of Spanish Minority Languages (Catalan and Galician)". Luis Villarrubia, Paloma Leon, Luis Hernandez (Speech Technology Group, Telefonica I&D, Madrid, Spain); Climent Nadeu, Ignasi Esquerra, Javier Hernando (Dept. TSC, Universitat Politècnica de Catalunya, Barcelona, Spain); Carmen Garcia-Mateo, Laura Docio (ETSIT de Telecomunicación, Universidad de Vigo, Vigo, Spain). 9:20 "Written Linguistic Resources in Catalan: the DCC Project". Joan Soler Bou (Institut d'Estudis Catalans, Barcelona, Spain). 9:40 "The MELIN project". Donncha Ó Cróinín (Institiúid Teangeolaíochta Éireann/Linguistics Institute of Ireland, Dublin, Ireland). 10:00 COFFEE 10:30 "A framework for the automatic processing of Basque". I. Aldezabal, O. Ansa, J.M. Arriola, A. Díaz de Ilarraza, N. Ezeiza, A. Maritxalar, M. Oronoz, K. Sarasola (Euskal Herriko Unibertsitatea, Spain); I. Aduriz, M. Urkia (UZEI, Donostia, Spain). 10:50 "Towards the creation of new Galician language resources: From a printed dictionary to the Galician WordNet". Fernando Magan (Ramón Piñeiro Research Center for Humanities, Santiago de Compostela, Spain). 11:10 Poster Session 1 (odd-numbered authors at posters) 11:50 Poster Session 2 (even-numbered authors at posters) 12:30 Plenary 13:30 End ========================================================================== Poster papers ------------- 1 "A tagger environment for Galician". M. Vilares, J. Graña (Universidad de Corunna, Spain); T. Araujo, D. Cabrero, I. Diz (Ramón Piñeiro Research Center for Humanities, Santiago de Compostela, Spain). 2 "A bilingual Spanish-Catalan database of units for concatenative synthesis". I. Esquerra, A. Bonafonte, F. Vallverdú, A. Febrer (Universitat Politècnica de Catalunya, Barcelona, Spain). 3 "Methods and tools for building the Catalan WordNet". L. Benítez, S. Cervell, G. Escudero, M. López, G. Rigau, M. Taulé (Universitat Politècnica de Catalunya, Barcelona, Spain; Universitat de Barcelona). 4 "Lemmatisation of the corpus of Cornish". J. Mills (University of Luton, England, UK). 5 "SpeechDat Cymru: A large-scale telephony Welsh database". R.J. Jones, J.S. Mason (Univ. of Wales, Swansea, Wales, UK); L. Helliker, M. Pawlewski (BT Labs, Ipswich, England, UK). 6 "KGB Project: Tools and resources for Breton language learning". J. Siroux, H. Gourmelon, G. Mercier, J-P. Messager (ENSSAT, Lannion, France). 7 "A speech database in Basque language". K. López de Ipiña, I. Torres, L. Oñederra (Euskal Herriko Unibertsitatea, Spain). 8 "An overview of the existing language resources for 'Gallego'". C. García-Mateo (Universidade de Vigo, Spain); M. González-González (Universidade de Santiago, Spain). 9 "Language standardisation and linguistic resources: The case of Central Ladin (Dolomites)". F. Ciochetti (Istitut Ladin, Vigo di Fassa, Italy); F. Pianesi (IRST, Trento, Italy). 10 "The LE-PAROLE project and the National Corpus of Irish". D. Ó Cróinín, E. Uí Dhonnchadha (Institiúid Teangeolaíochta Éireann/Linguistics Institute of Ireland, Dublin, Ireland). 11 "Design of a phonetic corpus for speech recognition in Catalan". I. Esquerra, C. Nadeu (Universitat Politécnica de Catalunya, Barcelona, Spain); L. Villarrubia (Telefónica Investigación y Desarrollo, Madrid, Spain). 12 "Levels of annotation for a Welsh speech database for phonetic research". B. Williams (University of Edinburgh, Scotland, UK). ------------------------------------------------------------------------------
Specific queries about the conference should be directed to:
LREC Secretariat Facultad de Traduccion e Interpretacion Dpto. de Traduccion e Interpretacion C/ Puentezuelas, 55 18002 Granada, SPAIN Tel: +34 58 24 41 00 - Fax: +34 58 24 41 04 e-mail: reli98@goliat.ugr.es |