Abstract
Keyword Spotting (KWS) is an essential component in contemporary audio-based deep learning systems and should be of minimal design when the system is working in streaming and on-device environments. We presented a robust feature extraction with a single-layer dynamic convolution model in our previous work. In this letter, we expand our earlier study into multi-layers of operation and propose a robust Knowledge Distillation (KD) learning method. Based on the distribution between class-centroids and embedding vectors, we compute three distinct distance metrics for the KD training and feature extraction processes. The results indicate that our KD method shows similar KWS performance over state-of-the-art models in terms of KWS but with low computational costs. Furthermore, our proposed method results in a more robust performance in noisy environments than conventional KD methods.
Original language | English |
---|---|
Pages (from-to) | 2298-2302 |
Number of pages | 5 |
Journal | IEEE Signal Processing Letters |
Volume | 29 |
DOIs | |
Publication status | Published - 2022 |
Keywords
- Keyword spotting
- knowledge distillation
- prototypical learning
ASJC Scopus subject areas
- Signal Processing
- Applied Mathematics
- Electrical and Electronic Engineering