TY - JOUR
T1 - Bird sounds classification by combining PNCC and robust Mel-log filter bank features
AU - Badi, Alzahra
AU - Ko, Kyungdeuk
AU - Ko, Hanseok
N1 - Funding Information:
This work was funded by the Ministry of Environment supported by the Korea Environmental Industry & Technology Institute's environmental policy-based public technology development project (2017000210001).
Publisher Copyright:
© 2019 Acoustical Society of Korea. All rights reserved.
PY - 2019
Y1 - 2019
N2 - In this paper, combining features is proposed as a way to enhance the classification accuracy of sounds under noisy environments using the CNN (Convolutional Neural Network) structure. A robust log Mel-filter bank using Wiener filter and PNCCs (Power Normalized Cepstral Coefficients) are extracted to form a 2-dimensional feature that is used as input to the CNN structure. An ebird database is used to classify 43 types of bird species in their natural environment. To evaluate the performance of the combined features under noisy environments, the database is augmented with 3 types of noise under 4 different SNRs (Signal to Noise Ratios) (20 dB, 10 dB, 5 dB, 0 dB). The combined feature is compared to the log Mel-filter bank with and without incorporating the Wiener filter and the PNCCs. The combined feature is shown to outperform the other mentioned features under clean environments with a 1.34 % increase in overall average accuracy. Additionally, the accuracy under noisy environments at the 4 SNR levels is increased by 1.06 % and 0.65 % for shop and schoolyard noise backgrounds, respectively.
AB - In this paper, combining features is proposed as a way to enhance the classification accuracy of sounds under noisy environments using the CNN (Convolutional Neural Network) structure. A robust log Mel-filter bank using Wiener filter and PNCCs (Power Normalized Cepstral Coefficients) are extracted to form a 2-dimensional feature that is used as input to the CNN structure. An ebird database is used to classify 43 types of bird species in their natural environment. To evaluate the performance of the combined features under noisy environments, the database is augmented with 3 types of noise under 4 different SNRs (Signal to Noise Ratios) (20 dB, 10 dB, 5 dB, 0 dB). The combined feature is compared to the log Mel-filter bank with and without incorporating the Wiener filter and the PNCCs. The combined feature is shown to outperform the other mentioned features under clean environments with a 1.34 % increase in overall average accuracy. Additionally, the accuracy under noisy environments at the 4 SNR levels is increased by 1.06 % and 0.65 % for shop and schoolyard noise backgrounds, respectively.
KW - Acoustic event recognition
KW - CNN (Convolutional Neural Network)
KW - Environmental sound classification
KW - PNCCs (Power Normalized Cepstral Coefficients)
KW - Weiner filter
UR - http://www.scopus.com/inward/record.url?scp=85079185086&partnerID=8YFLogxK
U2 - 10.7776/ASK.2019.38.1.039
DO - 10.7776/ASK.2019.38.1.039
M3 - Article
AN - SCOPUS:85079185086
SN - 1225-4428
VL - 38
SP - 39
EP - 46
JO - Journal of the Acoustical Society of Korea
JF - Journal of the Acoustical Society of Korea
IS - 1
ER -