TY - GEN
T1 - Gesture-based dynamic Bayesian network for noise robust speech recognition
AU - Mitra, Vikramjit
AU - Nam, Hosung
AU - Espy-Wilson, Carol Y.
AU - Saltzman, Elliot
AU - Goldstein, Louis
PY - 2011
Y1 - 2011
N2 - Previously we have proposed different models for estimating articulatory gestures and vocal tract variable (TV) trajectories from synthetic speech. We have shown that when deployed on natural speech, such models can help to improve the noise robustness of a hidden Markov model (HMM) based speech recognition system. In this paper we propose a model for estimating TVs trained on natural speech and present a Dynamic Bayesian Network (DBN) based speech recognition architecture that treats vocal tract constriction gestures as hidden variables, eliminating the necessity for explicit gesture recognition. Using the proposed architecture we performed a word recognition task for the noisy data of Aurora-2. Significant improvement was observed in using the gestural information as hidden variables in a DBN architecture over using only the mel-frequency cepstral coefficient based HMM or DBN backend. We also compare our results with other noise-robust front ends.
AB - Previously we have proposed different models for estimating articulatory gestures and vocal tract variable (TV) trajectories from synthetic speech. We have shown that when deployed on natural speech, such models can help to improve the noise robustness of a hidden Markov model (HMM) based speech recognition system. In this paper we propose a model for estimating TVs trained on natural speech and present a Dynamic Bayesian Network (DBN) based speech recognition architecture that treats vocal tract constriction gestures as hidden variables, eliminating the necessity for explicit gesture recognition. Using the proposed architecture we performed a word recognition task for the noisy data of Aurora-2. Significant improvement was observed in using the gestural information as hidden variables in a DBN architecture over using only the mel-frequency cepstral coefficient based HMM or DBN backend. We also compare our results with other noise-robust front ends.
KW - Articulatory Phonology
KW - Articulatory Speech Recognition
KW - Dynamic Bayesian Network
KW - Noise-robust Speech Recognition
KW - Task Dynamic model
KW - Vocal-Tract variables
UR - http://www.scopus.com/inward/record.url?scp=80051649631&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2011.5947522
DO - 10.1109/ICASSP.2011.5947522
M3 - Conference contribution
AN - SCOPUS:80051649631
SN - 9781457705397
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5172
EP - 5175
BT - 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
T2 - 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Y2 - 22 May 2011 through 27 May 2011
ER -