Title

The Greedy Algorithm and its Application to the Construction of a Continuous Speech Database

Authors

Hélène François (IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires
Université de Rennes 1, Enssat, Lannion, France)

Olivier Boëffard (IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires
Université de Rennes 1, Enssat, Lannion, France)

Session

SO7: Tools For Spoken LRs

Abstract

Databases containing varied linguistic features can be build by condensing large corpora; in this work we need to cover a set of phonetic units with a minimal set of natural phonetic sentences. With this aim in view we compare three set covering methods: the greedy method, its inverse which we call the spitting method, and the pair exchange method. Each method is defined with several criteria guiding the selection of sentences; they relate to the number of units of the sentences, to their length, and to the rareness of their units. A first experiment shows that pair exchange method doesn't guarantee a total covering. Greedy and spitting methods performances are comparable; nevertheless greedy is a bit better and above all less time-consuming. Applying spitting method to a greedy cover increases performance by removing about 10% redundancy. So does pair exchange method, but it is more time-consuming. Most of the criteria guiding selections are sensitive to the sentences length. Criteria performances obtained for a total covering are not necessarily transposable to a partial covering.

Keywords

Greedy method, Set covering problem, Construction of speech database, Condensation

Full Paper

265.pdf