Abstract
Recent deep-learning approaches have shown that Frequency Transformation (FT) blocks can significantly improve spectrogram-based single-source separation models by capturing frequency patterns. The goal of this paper is to extend the FT block to fit the multi-source task. We propose the Latent Source Attentive Frequency Transformation (LaSAFT) block to capture source-dependent frequency patterns. We also propose the Gated Point-wise Convolutional Modulation (GPoCM), an extension of Feature-wise Linear Modulation (FiLM), to modulate internal features. By employing these two novel methods, we extend the Conditioned-U-Net (CUNet) for multi-source separation, and the experimental results indicate that our LaSAFT and GPoCM can improve the CUNet’s performance, achieving state-of-the-art SDR performance on several MUSDB18 source separation tasks.
| Original language | English |
|---|---|
| Title of host publication | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 171-175 |
| Number of pages | 5 |
| ISBN (Electronic) | 9781728176055 |
| DOIs | |
| Publication status | Published - 2021 |
| Event | 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada Duration: 2021 Jun 6 → 2021 Jun 11 |
Publication series
| Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
|---|---|
| Volume | 2021-June |
| ISSN (Print) | 1520-6149 |
Conference
| Conference | 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 |
|---|---|
| Country/Territory | Canada |
| City | Virtual, Toronto |
| Period | 21/6/6 → 21/6/11 |
Bibliographical note
Publisher Copyright:© 2021 IEEE
Keywords
- Attention
- Conditioned source separation
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering