Title |
Creating Slovenian Language Resources for Development of Speech-to-Speech Translation Components |
Author(s) |
Darinka Verdonik, Matej Rojc, Zdravko Kačič University of Maribor, Faculty of Electrical Engineering and Computer Scinence, Smetanova ul. 17, Maribor, Slovenia |
Session |
P18-S |
Abstract |
Article brings detailed information about procedures of building Slovenian lexica within the LC-STAR project, and also detailed information about the size of that lexica. University of Maribor joined the LC-STAR project in order to provide appropriate language resources for developing speech-to-speech translation technology for Slovenian language. Lexica exists from three parts: 65.000 common words, 45.000 proper names and 6.000 special application domain words. All lexica will be morpho-syntactically tagged and phonetically transcribed. Quality of produced language resources is ensured by independent validation. |
Keyword(s) |
speech-to-speech translation, Slovenian, LC-STAR, POS, lexica, word list, proper names, common words |
Language(s) | Slovenian |
Full Paper |