Abstract
As environmental concerns have increased over the recent years, animal sound classification has been receiving more attention. However, collecting a good animal sound database for this task is difficult. This is because recorded animal audio files tend to be impure, as they contain other sounds such as environmental noise and audio from other species. Such a dataset would not be suitable for testing animal sound classification models as the mixed sounds may cause misclassifications. Source separation is a remedy that addresses this problem. If source separation is successfully applied to separate these mixtures, a proper sound dataset can be constructed. Over the years, several separation methods developed. Few source separation methods have been created to separate animal sound mixtures. However, most source separation methods depend on knowing the number of sources in the mixture audio file to work, even though this is rarely the case in the real world. While there were studies on source separation for an unknown number of speakers, none applied to animal sounds. For that reason, we propose to combine a source number estimator with a source separation method for an unknown number of speakers. In this paper, we attempted to find the optimal method of animal sound source separation for an unknown number of sources by training the SuDORM-RF, an efficient general purpose neural network for audio source separation, on animal audio files with 2, 3, and 4 sources, and applying it in numerous ways. We use the Gerschgorin disk estimator to estimate the number of sources in mixtures and use that knowledge to determine the model that is best suited to separate that number of sources. We also assessed how recurrent source separation works with animal sounds, as well as the performance of models trained for a higher number of sources used on mixtures with a lower number of sources. Results show that the models trained for a higher number of sources output false estimates alongside the acceptable estimates when applied to mixtures with a lower number of sources due to the structure of the model. Experiments with the recurrent source separation showed that ordinary methods applied in the speech separation for recurrent source separation cannot be applied to animal sound separation. Finally, the usage of the Gerschgorin disk estimator to estimate the number of sources in mixtures and determine the model to separate that number of sources proved to be the most stable method to separate animal sound sources.
Original language | English |
---|---|
Journal | Proceedings of the International Congress on Acoustics |
Publication status | Published - 2022 |
Event | 24th International Congress on Acoustics, ICA 2022 - Gyeongju, Korea, Republic of Duration: 2022 Oct 24 → 2022 Oct 28 |
Bibliographical note
Funding Information:This work was supported by Korea Environment Industry & Technology Institute(KEITI) through Exotic Invasive Species Management Program, funded by Korea Ministry of Environment(MOE)(2021002280004)
Publisher Copyright:
© ICA 2022.All rights reserved
Keywords
- Animal sound
- Source number estimation
- Source separation
ASJC Scopus subject areas
- Mechanical Engineering
- Acoustics and Ultrasonics