Sahar Ghannay is a post-doc researcher at LIUM laboratory (under Yannick Estève) since October 2017, working on end-to-end neural approaches for speech understanding, translation and named entity detection.
She received a PhD in Computer Science, entitled « A study of continuous word representations applied to ASR error detection », from Le Mans University in September 2017.
Sahar did an internship for several months at Apple within the Siri Speech team under the direction of Xiaochuan Niu and Ilya Oparin.
Her main research interests are continuous word representations and their applications to natural language and spoken language processing.

PhD Thesis

Title: A study of continuous word representations applied to the automatic detection of speech recognition errors
Period: 3 years from october 2014 to September 2017
  • Thesis director: Yannick Estève (Professor and Head of the LIUM )
  • Supervisor: Nathalie Camelin (Associate Professor)

  • Abstract: My thesis concerns a study of continuous word representations applied to the automatic detection of speech recognition errors. Recent advances in the field of speech processing have led to significant improvements in speech recognition performances. However, recognition errors are still unavoidable. This reflects their sensitivity to the variability, e.g. to acoustic conditions, speaker, language style, etc. Our study focuses on the use of a neural approach to improve ASR error detection, using word embeddings. These representations have proven to be a great asset in various natural language processing tasks (NLP). The exploitation of continuous word representations is motivated by the fact that ASR error detection consists on locating the possible linguistic or acoustic incongruities in automatic transcriptions. The aim is therefore to find the appropriate word representation, which makes it possible to capture pertinent information in order to be able to detect these anomalies. Our contribution in this thesis concerns several initiatives. First, we start with a preliminary study in which we propose a neural architecture able to integrate different types of features, including word embeddings. Second, we propose a deep study of continuous word representations. This study focuses on the evaluation of different types of linguistic word embeddings and their combination in order to take advantage of their complementarities. On the other hand, it focuses on acoustic embeddings. The proposed approach relies on the use of a convolution neural network to build acoustic signal embeddings, and a deep neural network to build acoustic word embeddings. In addition, we propose two approaches to evaluate the performance of acoustic word embeddings. We also propose to enrich the word representation, in input of the ASR error detection system, by prosodic features in addition to linguistic and acoustic embeddings. Integrating this information into our neural architecture provides a significant improvement in terms of classification error rate reduction in comparison to a conditional random field (CRF) based state-of-the-art approach. Then, we present a study on the analysis of classification errors, with the aim of perceiving the errors that are difficult to detect. Perspectives for improving the performance of our system are also proposed, by modelling the errors at the sentence level. Finally, we exploit the linguistic and acoustic embeddings as well as the information provided by our ASR error detection system in several downstream applications.

    → 12 publications: most cited publications
    → Submission in progress: At Computer Speech and Language Journal

    Master thesis

    Title: Combination of machine translation systems outputs
    Period: from 02/2013 to 07/2013 (6 months)
    Supervisor: Loïc Barrault (Associate Professor)

    Abstract : Machine translation (MT) system combination has taken a great importance these past few years. This is mainly due to the fact that single systems achieved good performance and the possibility of taking the most of their complementarity in a system combination framework is very attractive. This work we focused on system combination, especially on the integration of new Knowledge in MANY's decoder. MANY is open source system combination software based on confusion networks decoding developed at LIUM  by Loïc BARRAUL. This system has been updated in order to estimate the word confidence score and to boost n-grams present in input hypotheses by using an adapted language model or adding additional feature in the decoder.


    Research Projects

    I am/was working in the following projects:
    • VERA, ANR project on advanded error analysis of ASR systems, 2013-2016
    • M2CR Europan project on Multilingual Multimodal Continuous Representation for Human Language Understanding, ERA-Net (CHIST-ERA), 2016-2019
    • News.bridge


    During the last period of my thesis, i did an internship for several months at Apple within the Siri Speech team, at Cupertino, CA USA. I was working on neural langage modeling for speech recognition, under the direction of Xiaochuan Niu and Ilya Oparin.

Laboratoire d'Informatique de l'Université du Maine (LIUM)
Institut d'Informatique Claude Chappe
Université du Maine, Avenue Laënnec
Tél : +33/02 43 83 38 52
Fax : +33/0 2 43 83 38 68