Audio-to-visual conversion using hidden Markov models

Soonkyu Lee, Dongsuk Yook

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    30 Citations (Scopus)

    Abstract

    We describe audio-to-visual conversion techniques for efficient multimedia communications. The audio signals are automatically converted to visual images of mouth shape. The visual speech can be represented as a sequence of visemes, which are the generic face images corresponding to particular sounds. Visual images synchronized with audio signals can provide userfriendly interface for man machine interactions. Also, it can be used to help the people with impaired-hearing. We use HMMs (hidden Markov models) to convert audio signals to a sequence of visemes. In this paper, we compare two approaches in using HMMs. In the first approach, an HMM is trained for each viseme, and the audio signals are directly recognized as a sequence of visemes. In the second approach, each phoneme is modeled with an HMM, and a general phoneme recognizer is utilized to produce a phoneme sequence from the audio signals. The phoneme sequence is then converted to a viseme sequence. We implemented the two approaches and tested them on the TIMIT speech corpus. The viseme recognizer shows 33.9% error rate, and the phoneme-based approach exhibits 29.7% viseme recognition error rate. When similar viseme classes are merged, we have found that the error rates can be reduced to 20.5% and 13.9%, respectably.

    Original languageEnglish
    Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    PublisherSpringer Verlag
    Pages563-570
    Number of pages8
    Volume2417
    ISBN (Print)3540440380, 9783540440383
    Publication statusPublished - 2002
    Event7th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2002 - Tokyo, Japan
    Duration: 2002 Aug 182002 Aug 22

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume2417
    ISSN (Print)03029743
    ISSN (Electronic)16113349

    Other

    Other7th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2002
    Country/TerritoryJapan
    CityTokyo
    Period02/8/1802/8/22

    ASJC Scopus subject areas

    • General Computer Science
    • Theoretical Computer Science

    Fingerprint

    Dive into the research topics of 'Audio-to-visual conversion using hidden Markov models'. Together they form a unique fingerprint.

    Cite this