Abstract
Voice Activity Detection (VAD) systems suffer from unexpected and non-stationary background noises at magnitudes sufficiently high to mask the speech signal.Although several methods of increasing the performance of VAD have been proposed, their approaches have yet to mitigate the influence of the background noise itself. This letter proposes an effective noise-robust VAD system approach. The proposed method uses spectral attention and temporal attention through applying a deep learning-based attention mechanism. The proposed method is demonstrated and compared with several other deep learning-based methods in terms of the area under the curve in experiments with either known or unknown noise-added, and real-world noisy data. The results show that the proposed method outperforms the other methods in all the scenarios considered, but moreover generalizes well in environments of unknown or unexpected noise.
Original language | English |
---|---|
Article number | 8933025 |
Pages (from-to) | 131-135 |
Number of pages | 5 |
Journal | IEEE Signal Processing Letters |
Volume | 27 |
DOIs | |
Publication status | Published - 2020 |
Externally published | Yes |
Keywords
- Deep neural networks
- attention mechanism
- speech activity detection
- speech detection
- voice activity detection
ASJC Scopus subject areas
- Signal Processing
- Electrical and Electronic Engineering
- Applied Mathematics