Abstract
In this paper, we propose a Dilated Convolution Gate Linear Unit (DCGLU) to mitigate the lack of sparsity and small receptive field problems caused by the segmentation map extraction process in sound event detection with weak labels. In the advent of deep learning framework, segmentation map extraction approaches have shown improved performance in noisy environments. However, these methods are forced to maintain the size of the feature map to extract the segmentation map as the model would be constructed without a pooling operation. As a result, the performance of these methods is deteriorated with a lack of sparsity and a small receptive field. To mitigate these problems, we utilize GLU to control the flow of information and Dilated Convolutional Neural Networks (DCNNs) to increase the receptive field without additional learning parameters. For the performance evaluation, we employ a URBAN-SED and self-organized bird sound dataset. The relevant experiments show that our proposed DCGLU model outperforms over other baselines. In particular, our method is shown to exhibit robustness against nature sound noises with three Signal to Noise Ratio (SNR) levels (20 dB, 10 dB and 0 dB).
Original language | English |
---|---|
Pages (from-to) | 414-423 |
Number of pages | 10 |
Journal | Journal of the Acoustical Society of Korea |
Volume | 39 |
Issue number | 5 |
DOIs | |
Publication status | Published - 2020 |
Bibliographical note
Publisher Copyright:Copyright © 2020 The Acoustical Society of Korea.
Keywords
- Audio tagging
- Dilated convolution
- Gated linear unit
- Sound event detection
- T-f segmentation map
- Weak label
ASJC Scopus subject areas
- Signal Processing
- Instrumentation
- Acoustics and Ultrasonics
- Applied Mathematics
- Speech and Hearing