In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize "real time" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.
|Title of host publication
|Proceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002
|Institute of Electrical and Electronics Engineers Inc.
|Number of pages
|Published - 2002
|4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 - Pittsburgh, United States
Duration: 2002 Oct 14 → 2002 Oct 16
|4th IEEE International Conference on Multimodal Interfaces, ICMI 2002
|02/10/14 → 02/10/16
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Graphics and Computer-Aided Design
- Computer Vision and Pattern Recognition
- Hardware and Architecture