Abstract
Voice activity detection plays an important role for an efficient voice interface between human and mobile devices, since it can be used as a trigger to activate an automatic speech recognition module of a mobile device. If the input speech signal can be recognized as a predefined magic word coming from a legitimate user, it can be utilized as a trigger. In this paper, we propose a voice trigger system using a keyword-dependent speaker recognition technique. The voice trigger must be able to perform keyword recognition, as well as speaker recognition, without using computationally demanding speech recognizers to properly trigger a mobile device with low computational power consumption. We propose a template based method and a hidden Markov model (HMM) based method for the voice trigger to solve this problem. The experiments using a Korean word corpus show that the template based method performed 4.1 times faster than the HMM based method. However, the HMM based method reduced the recognition error by 27.8% relatively compared to the template based method. The proposed methods are complementary and can be used selectively depending on the device of interest.1
Original language | English |
---|---|
Article number | 5373813 |
Pages (from-to) | 2377-2384 |
Number of pages | 8 |
Journal | IEEE Transactions on Consumer Electronics |
Volume | 55 |
Issue number | 4 |
DOIs | |
Publication status | Published - 2009 Nov |
Keywords
- Dynamic time warping
- Gaussian mixture model
- Hidden Markov model
- Keyword recognition
- Speaker recognition
- Vector quantization
- Voice trigger
ASJC Scopus subject areas
- Media Technology
- Electrical and Electronic Engineering