Dilated convolution and gated linear unit based sound event detection and tagging algorithm using weak label

Chungho Park, Donghyun Kim, Hanseok Ko

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

In this paper, we propose a Dilated Convolution Gate Linear Unit (DCGLU) to mitigate the lack of sparsity and small receptive field problems caused by the segmentation map extraction process in sound event detection with weak labels. In the advent of deep learning framework, segmentation map extraction approaches have shown improved performance in noisy environments. However, these methods are forced to maintain the size of the feature map to extract the segmentation map as the model would be constructed without a pooling operation. As a result, the performance of these methods is deteriorated with a lack of sparsity and a small receptive field. To mitigate these problems, we utilize GLU to control the flow of information and Dilated Convolutional Neural Networks (DCNNs) to increase the receptive field without additional learning parameters. For the performance evaluation, we employ a URBAN-SED and self-organized bird sound dataset. The relevant experiments show that our proposed DCGLU model outperforms over other baselines. In particular, our method is shown to exhibit robustness against nature sound noises with three Signal to Noise Ratio (SNR) levels (20 dB, 10 dB and 0 dB).

Original languageEnglish
Pages (from-to)414-423
Number of pages10
JournalJournal of the Acoustical Society of Korea
Volume39
Issue number5
DOIs
Publication statusPublished - 2020

Bibliographical note

Publisher Copyright:
Copyright © 2020 The Acoustical Society of Korea.

Keywords

  • Audio tagging
  • Dilated convolution
  • Gated linear unit
  • Sound event detection
  • T-f segmentation map
  • Weak label

ASJC Scopus subject areas

  • Signal Processing
  • Instrumentation
  • Acoustics and Ultrasonics
  • Applied Mathematics
  • Speech and Hearing

Fingerprint

Dive into the research topics of 'Dilated convolution and gated linear unit based sound event detection and tagging algorithm using weak label'. Together they form a unique fingerprint.

Cite this