Abstract
This work describes a real-time lip-sync method using which an avatar's lip shape is synchronized with the corresponding speech signal. Phoneme recognition is generally regarded as an important task in the operation of a real-time lip-sync system. In this work, the use of the Head-Body-Tail (HBT) model is proposed for the purpose of more efficiently recognizing phonemes which are variously uttered due to co-articulation effects. The HBT model effectively deals with the transition parts of context-dependent models for small-sized vocabulary tasks. These models provide better recognition performance than general context-dependent or context-independent models for the task of digit or vowel recognition. Moreover, each phoneme is categorized into one among four classes and the class-dependent codebook is generated to further improve the performance. Additionally, for the clear representation of the context dependency information in the transient parts, some Gaussians are excluded from class-dependent codebook. The proposed method leads to a lip-sync system that performs at a level that is similar to previous designs based on HBT and continuous hidden Markov models (CHMMs). However, our method reduces the number of model parameters by one-third and enables real-time operation.
Original language | English |
---|---|
Article number | 4668497 |
Pages (from-to) | 1299-1306 |
Number of pages | 8 |
Journal | IEEE Transactions on Multimedia |
Volume | 10 |
Issue number | 7 |
DOIs | |
Publication status | Published - 2008 Nov |
Bibliographical note
Funding Information:Manuscript received June 25, 2007; revised July 07, 2008. Current version published November 17, 2008. This work was supported by Grant R01-2006-000-11162-0(2008) from the Basic Research Program Korea Science and Engineering Foundation of the Ministry of Science and Technology. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Hanseok Ko.
Keywords
- Head-body-tail HMM
- Phoneme recognition
- Real-time lip-sync
ASJC Scopus subject areas
- Signal Processing
- Media Technology
- Computer Science Applications
- Electrical and Electronic Engineering