TY - GEN
T1 - Design of audio-visual interface for aiding driver's voice commands in automotive environment
AU - Kim, Kihyeon
AU - Jeon, Changwon
AU - Park, Junho
AU - Jeong, Seokyeong
AU - Han, David K.
AU - Ko, Hanseok
N1 - Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2009
Y1 - 2009
N2 - This chapter describes an information-modeling and integration of an embedded audio-visual speech recognition system, aimed at improving speech recognition under adverse automobile noisy environment. In particular, we employ lip-reading as an added feature for enhanced speech recognition. Lip motion feature is extracted by active shape models and the corresponding hidden Markov models are constructed for lip-readinglip-reading. For realizing efficient hidden Markov models, tied-mixture technique is introduced for both visual and acoustical information. It makes the model structure simple and small while maintaining suitable recognition performance. In decoding process, the audio-visual information is integrated into the state output probabilities of hidden Markov model as multistream featuresmultistream features. Each stream is weighted according to the signal-to-noise ratio so that the visual information becomes more dominant under adverse noisy environment of an automobile. Representative experimental results demonstrate that the audio-visual speech recognition system achieves promising performance in adverse noisy condition, making it suitable for embedded devices.
AB - This chapter describes an information-modeling and integration of an embedded audio-visual speech recognition system, aimed at improving speech recognition under adverse automobile noisy environment. In particular, we employ lip-reading as an added feature for enhanced speech recognition. Lip motion feature is extracted by active shape models and the corresponding hidden Markov models are constructed for lip-readinglip-reading. For realizing efficient hidden Markov models, tied-mixture technique is introduced for both visual and acoustical information. It makes the model structure simple and small while maintaining suitable recognition performance. In decoding process, the audio-visual information is integrated into the state output probabilities of hidden Markov model as multistream featuresmultistream features. Each stream is weighted according to the signal-to-noise ratio so that the visual information becomes more dominant under adverse noisy environment of an automobile. Representative experimental results demonstrate that the audio-visual speech recognition system achieves promising performance in adverse noisy condition, making it suitable for embedded devices.
KW - Active shape model
KW - Audio-visual speech interface
KW - Automatic speech recognition
KW - Hybrid integration
KW - Lip-reading
KW - Mel-frequency cepstrum coefficients
KW - Mouth model
KW - Multistream features
KW - SNR-dependent audio-visual information combination
KW - Tied-mixture hidden Markov model
UR - http://www.scopus.com/inward/record.url?scp=84892306514&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84892306514&partnerID=8YFLogxK
U2 - 10.1007/978-0-387-79582-9_17
DO - 10.1007/978-0-387-79582-9_17
M3 - Conference contribution
AN - SCOPUS:84892306514
SN - 9780387795812
T3 - In-Vehicle Corpus and Signal Processing for Driver Behavior
SP - 211
EP - 219
BT - In-Vehicle Corpus and Signal Processing for Driver Behavior
PB - Springer Science and Business Media, LLC
T2 - 3rd Biennial Workshop on Digital Signal Processing for Mobile and Vehicular Systems, DSP 2007
Y2 - 1 June 2007 through 1 June 2007
ER -