Multitask learning of deep neural network-based keyword spotting for iot devices

Seong Gyun Leem, In Chul Yoo, Dongsuk Yook

Research output: Contribution to journalArticlepeer-review

27 Citations (Scopus)


Speech-based interfaces are convenient and intuitive, and therefore, strongly preferred by Internet of Things (IoT) devices for human-computer interaction. Pre-defined keywords are typically used as a trigger to notify devices for inputting the subsequent voice commands. Keyword spotting techniques used as voice trigger mechanisms, typically model the target keyword via triphone models and non-keywords through single-state filler models. Recently, deep neural networks (DNNs) have shown better performance compared to hidden Markov models with Gaussian mixture models, in various tasks including speech recognition. However, conventional DNN-based keyword spotting methods cannot change the target keywords easily, which is an essential feature for speech-based IoT device interface. Additionally, the increase in computational requirements interferes with the use of complex filler models in DNN-based keyword spotting systems, which diminishes the accuracy of such systems. In this paper, we propose a novel DNN-based keyword spotting system that alters the keyword on the fly and utilizes triphone and monophone acoustic models in an effort to reduce computational complexity and increase generalization performance. The experimental results using the FFMTIMIT corpus show that the error rate of the proposed method was reduced by 36.6%.

Original languageEnglish
Article number8641328
Pages (from-to)188-194
Number of pages7
JournalIEEE Transactions on Consumer Electronics
Issue number2
Publication statusPublished - 2019 May

Bibliographical note

Funding Information:
Manuscript received September 16, 2018; revised November 24, 2018, January 7, 2019, and February 7, 2019; accepted February 8, 2019. Date of publication February 13, 2019; date of current version April 23, 2019. This work was supported in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning under Grant NRF-2017R1E1A1A01078157, in part by the Ministry of Science and ICT (MSIT) through the Information Technology Research Center (ITRC) support program supervised by the Institute for Information and Communications Technology Promotion (IITP) under Grant IITP-2018-0-01405, and in part by IITP grant funded by the Korean Government (MSIT, A research on safe and convenient big data processing methods) under Grant 2018-0-00269. (Corresponding author: Dongsuk Yook.) The authors are with the Artificial Intelligence Laboratory, Department of Computer Science and Engineering, Korea University, Seoul 02841, South Korea (e-mail: [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/TCE.2019.2899067

Publisher Copyright:
© 1975-2011 IEEE.


  • Deep neural network
  • keyword spotting
  • multitask learning

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering


Dive into the research topics of 'Multitask learning of deep neural network-based keyword spotting for iot devices'. Together they form a unique fingerprint.

Cite this