TY - GEN
T1 - Viseme recognition experiment using context dependent hidden Markov models
AU - Lee, Soonkyu
AU - Yook, Dongsuk
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2002.
PY - 2002
Y1 - 2002
N2 - Visual images synchronized with audio signals can provide user-friendly interface for man machine interactions. The visual speech can be represented as a sequence of visemes, which are the generic face images corresponding to particular sounds. We use HMMs (hidden Markov models) to convert audio signals to a sequence of visemes. In this paper, we compare two approaches in using HMMs. In the first approach, an HMM is trained for each triviseme which is a viseme with its left and right context, and the audio signals are directly recognized as a sequence of trivisemes. In the second approach, each triphone is modeled with an HMM, and a general triphone recognizer is used to produce a triphone sequence from the audio signals. The triviseme or triphone sequence is then converted to a viseme sequence. The performances of the two viseme recognition systems are evaluated on the TIMIT speech corpus.
AB - Visual images synchronized with audio signals can provide user-friendly interface for man machine interactions. The visual speech can be represented as a sequence of visemes, which are the generic face images corresponding to particular sounds. We use HMMs (hidden Markov models) to convert audio signals to a sequence of visemes. In this paper, we compare two approaches in using HMMs. In the first approach, an HMM is trained for each triviseme which is a viseme with its left and right context, and the audio signals are directly recognized as a sequence of trivisemes. In the second approach, each triphone is modeled with an HMM, and a general triphone recognizer is used to produce a triphone sequence from the audio signals. The triviseme or triphone sequence is then converted to a viseme sequence. The performances of the two viseme recognition systems are evaluated on the TIMIT speech corpus.
UR - http://www.scopus.com/inward/record.url?scp=84947935184&partnerID=8YFLogxK
U2 - 10.1007/3-540-45675-9_84
DO - 10.1007/3-540-45675-9_84
M3 - Conference contribution
AN - SCOPUS:84947935184
SN - 9783540440253
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 557
EP - 561
BT - Intelligent Data Engineering and Automated Learning - IDEAL 2002 - 3rd International Conference, Proceedings
A2 - Yin, Hujun
A2 - Allinson, Nigel
A2 - Freeman, Richard
A2 - Keane, John
A2 - Hubbard, Simon
PB - Springer Verlag
T2 - 3rd International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2002
Y2 - 12 August 2002 through 14 August 2002
ER -