SVM-based phoneme classification and lip shape refinement in real-time lip-synch system

Hanseok Ko, David K. Han

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


In this paper, we present a real time lip-synch system that activates 2-D avatar's lip motion in synch with incoming speech utterance. To achieve the real time operation of the system, the processing time was minimized by "merge and split" procedures resulting in coarse-to-fine phoneme classification. At each stage of phoneme classification, the support vector machine (SVM) method was applied to reduce the computational load while maintaining the desired accuracy. The coarse-to-fine phoneme classification, is accomplished via two_stages of feature extraction: in the first stage, each speech frame is acoustically analyzed for three classes of lip opening using Mel Frequency Cepstral Coefficients (MFCC) as a feature; in the second stage, each frame is further refined for detailed lip shape using formant information. The method was implemented in 2-D lip animation and it was demonstrated that the system was effective in accomplishing real-time lip-synch. This approach was tested on a PC using the Microsoft Visual Studio with an Intel Pentium IV 1.4 Giga Hz CPU and 384 MB RAM. It was observed that the methods of phoneme merging and SVM achieved about twice the speed in recognition than the method employing the Hidden Markov Model (HMM). A typical latency time per a single frame observed using the proposed method was in the order of 18.22 milliseconds while an HMM method under identical conditions resulted about 30.67 milliseconds.

Original languageEnglish
Pages (from-to)1029-1051
Number of pages23
JournalInternational Journal of Pattern Recognition and Artificial Intelligence
Issue number7
Publication statusPublished - 2006 Nov


  • Lip-synch
  • Real-time
  • Speech
  • Support vector machine
  • Viseme

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence


Dive into the research topics of 'SVM-based phoneme classification and lip shape refinement in real-time lip-synch system'. Together they form a unique fingerprint.

Cite this