Abstract
In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize "real time" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 299-304 |
| Number of pages | 6 |
| ISBN (Print) | 0769518346, 9780769518343 |
| DOIs | |
| Publication status | Published - 2002 |
| Event | 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 - Pittsburgh, United States Duration: 2002 Oct 14 → 2002 Oct 16 |
Other
| Other | 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 |
|---|---|
| Country/Territory | United States |
| City | Pittsburgh |
| Period | 02/10/14 → 02/10/16 |
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Graphics and Computer-Aided Design
- Computer Vision and Pattern Recognition
- Hardware and Architecture