Recognizing articulatory gestures from speech for robust speech recognition

Vikramjit Mitra, Hosung Nam, Carol Espy-Wilson, Elliot Saltzman, Louis Goldstein

Research output: Contribution to journalArticlepeer-review

23 Citations (Scopus)


Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.

Original languageEnglish
Pages (from-to)2270-2287
Number of pages18
JournalJournal of the Acoustical Society of America
Issue number3
Publication statusPublished - 2012 Mar
Externally publishedYes

Bibliographical note

Funding Information:
This work was supported by NSF Grants No. IIS0703859, No. IIS-0703048, and No. IIS0703782. The first two authors contributed equally to this study. 1

ASJC Scopus subject areas

  • Arts and Humanities (miscellaneous)
  • Acoustics and Ultrasonics


Dive into the research topics of 'Recognizing articulatory gestures from speech for robust speech recognition'. Together they form a unique fingerprint.

Cite this