Finnish and Estonian Speech Applications developed on an Object-Oriented Speech Processing and Database System

Toomas Altosaar, Matti Karjalainen, Martti Vainio, Einar Meister

ABSTRACT
Full utilisation of information available in speech databases has not always been feasible due to the differing standards and formats employed. In addition, the extra diversity introduced by the multilingual aspect has made the analysis of speech databases even more difficult under a single computing environment.

In this paper we briefly present the QuickSig object oriented signal processing system [1] that represents a modern tool with which to perform DSP related studies. It empowers speech scientists to operate in a flexible and motivating environment where signals, filters, spectrograms, etc., are all modelled as objects. Seamlessly integrated to QuickSig is an object-oriented database [2] that permits signals along with their features and relations to be stored persistently between sessions in a manner that is transparent to the user. A multilingual phonetic representational system [3] exists within the same environment and allows speech from different databases (e.g., different languages and phonetic alphabets) to be modelled generically. Relations between speech units such as sentences, words, phones, etc., are defined explicitly forming a phonetic object structure for each utterance. Complex pattern matching searches can be easily formulated by the user and made to traverse the phonetic structures returning desired contexts. These speech events can then be used in actual applications.

The remainder of the paper presents some of the applications that have been developed on this platform where Finnish and Estonian databases have been used as the source speech material. These include speech synthesis [4,5], recognition [6], and speaker verification/identification [7].

References:

  1. Karjalainen M., Altosaar T., Alku P., QuickSig - An Object-Oriented Signal Processing Environment. Proc. of IEEE ICASSP-88, New York 1988.
  2. Karjalainen M., Altosaar T., An Object-Oriented Database for Speech Processing. Proc. of EUROSPEECH-93, Berlin 1993.
  3. Altosaar T., Karjalainen M., Vainio M., A Multilingual Phonetic Representation and Analysis System for Different Speech Databases. Proc. of ICSLP 96, Philadelphia 1996.
  4. Vainio M., Aulanko R., Altosaar T., Karjalainen M., Modeling Finnish Microprosody for Speech Syhthesis. ESCA Workshop on Intonation Theory and Applications, pp. 309-312. Athens 1997
  5. Karjalainen M., Altosaar T., Vainio, M., Speech Synthesis using Warped Linear Prediction and Neural Networks. Proc. of IEEE ICASSP-98, Seattle 1998.
  6. Altosaar T., Karjalainen M., Diphone-Based Speech Recognition using Time-Event Neural Networks. Proc. of ICSLP-92, Banff, Canada 1992.
  7. Altosaar T., Meister E., Speaker Recognition Experiments in Estonian using Multi-Layer Feedforward Neural Nets. Proc. of EUROSPEECH-95, Madrid 1995.

Back to Programme l>