Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    2 Citations (Scopus)

    Abstract

    In this paper, we develop a real time lip-synch system that activates a 2D avatar's lip motion in synch with incoming speech utterance. To realize "real time" operation of the system, we contain the processing time by invoking a merge and split procedure performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply a support vector machine (SVM) to constrain the computational load while attaining desirable accuracy. Coarse-to-fine phoneme classification is accomplished via 2 stages of feature extraction, where each speech frame is acoustically analyzed first for 3 classes of lip opening using MFCC as the feature and then a further refined classification for detailed lip shape using formant information. We implemented the system with 2D lip animation that shows the effectiveness of the proposed 2-stage procedure accomplishing the real-time lip-synch task.

    Original languageEnglish
    Title of host publicationProceedings - 4th IEEE International Conference on Multimodal Interfaces, ICMI 2002
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages299-304
    Number of pages6
    ISBN (Print)0769518346, 9780769518343
    DOIs
    Publication statusPublished - 2002
    Event4th IEEE International Conference on Multimodal Interfaces, ICMI 2002 - Pittsburgh, United States
    Duration: 2002 Oct 142002 Oct 16

    Other

    Other4th IEEE International Conference on Multimodal Interfaces, ICMI 2002
    Country/TerritoryUnited States
    CityPittsburgh
    Period02/10/1402/10/16

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Computer Graphics and Computer-Aided Design
    • Computer Vision and Pattern Recognition
    • Hardware and Architecture

    Fingerprint

    Dive into the research topics of 'Achieving real-time lip-synch via SVM-based phoneme classification and lip shape refinement'. Together they form a unique fingerprint.

    Cite this