Hybrid speech recognition for voice search

Evandro Gouvea
Résumé du séminaire: 
In this talk I will compare different systems for use in information retrieval of items by voice. These systems differ only in the unit they use: words, subwords, a combination of these into a hybrid, and phones. A subword is a word fragment, which can be as small as a single phone or as large as a word. The subword set is derived by splitting words using a Minimum Description Length (MDL) criterion. In general, knowledge sources used by the speech recognition engine are tailored to the data indexed by the information retrieval component. A speech recognition engine that uses a language model and pronunciation dictionary built from each such an inventory of units, whether subwords or hybrid, is completely independent from the information retrieval task, and can, therefore, remain fixed, making this approach ideal for resource constrained systems. I will describe a voice search system, and present results on a music lyrics task showing that a hybrid system is far more robust to speech recognition errors than the alternatives.