Spectro-Temporal Attention-Based Voice Activity Detection

Younglo Lee, Jeongki Min, David K. Han, Hanseok Ko

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)


Voice Activity Detection (VAD) systems suffer from unexpected and non-stationary background noises at magnitudes sufficiently high to mask the speech signal.Although several methods of increasing the performance of VAD have been proposed, their approaches have yet to mitigate the influence of the background noise itself. This letter proposes an effective noise-robust VAD system approach. The proposed method uses spectral attention and temporal attention through applying a deep learning-based attention mechanism. The proposed method is demonstrated and compared with several other deep learning-based methods in terms of the area under the curve in experiments with either known or unknown noise-added, and real-world noisy data. The results show that the proposed method outperforms the other methods in all the scenarios considered, but moreover generalizes well in environments of unknown or unexpected noise.

Original languageEnglish
Article number8933025
Pages (from-to)131-135
Number of pages5
JournalIEEE Signal Processing Letters
Publication statusPublished - 2020
Externally publishedYes


  • Deep neural networks
  • attention mechanism
  • speech activity detection
  • speech detection
  • voice activity detection

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Applied Mathematics


Dive into the research topics of 'Spectro-Temporal Attention-Based Voice Activity Detection'. Together they form a unique fingerprint.

Cite this