A voice trigger system using keyword and speaker recognition for mobile devices

Hyeopwoo Lee, Sukmoon Chang, Dongsuk Yook, Yongserk Kim

    Research output: Contribution to journalArticlepeer-review

    20 Citations (Scopus)

    Abstract

    Voice activity detection plays an important role for an efficient voice interface between human and mobile devices, since it can be used as a trigger to activate an automatic speech recognition module of a mobile device. If the input speech signal can be recognized as a predefined magic word coming from a legitimate user, it can be utilized as a trigger. In this paper, we propose a voice trigger system using a keyword-dependent speaker recognition technique. The voice trigger must be able to perform keyword recognition, as well as speaker recognition, without using computationally demanding speech recognizers to properly trigger a mobile device with low computational power consumption. We propose a template based method and a hidden Markov model (HMM) based method for the voice trigger to solve this problem. The experiments using a Korean word corpus show that the template based method performed 4.1 times faster than the HMM based method. However, the HMM based method reduced the recognition error by 27.8% relatively compared to the template based method. The proposed methods are complementary and can be used selectively depending on the device of interest.1

    Original languageEnglish
    Article number5373813
    Pages (from-to)2377-2384
    Number of pages8
    JournalIEEE Transactions on Consumer Electronics
    Volume55
    Issue number4
    DOIs
    Publication statusPublished - 2009 Nov

    Bibliographical note

    Funding Information:
    1This work was supported by the Korea Research Foundation (KRF) grant funded by the Korea government (MEST) (No. 2009-0077392). It was also supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2009-C1090-0902-0007).

    Keywords

    • Dynamic time warping
    • Gaussian mixture model
    • Hidden Markov model
    • Keyword recognition
    • Speaker recognition
    • Vector quantization
    • Voice trigger

    ASJC Scopus subject areas

    • Media Technology
    • Electrical and Electronic Engineering

    Fingerprint

    Dive into the research topics of 'A voice trigger system using keyword and speaker recognition for mobile devices'. Together they form a unique fingerprint.

    Cite this