Abstract
In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. MANNER efficiently extracts three different representations from noisy speech and estimates high-quality clean speech. We evaluated MANNER on the VoiceBank-DEMAND dataset in terms of five objective speech quality metrics. Experimental results show that MANNER achieves state-of-the-art performance while efficiently processing noisy speech.
Original language | English |
---|---|
Title of host publication | 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 7842-7846 |
Number of pages | 5 |
ISBN (Electronic) | 9781665405409 |
DOIs | |
Publication status | Published - 2022 |
Event | 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore Duration: 2022 May 23 → 2022 May 27 |
Publication series
Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
Volume | 2022-May |
ISSN (Print) | 1520-6149 |
Conference
Conference | 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 |
---|---|
Country/Territory | Singapore |
City | Virtual, Online |
Period | 22/5/23 → 22/5/27 |
Bibliographical note
Publisher Copyright:© 2022 IEEE
Keywords
- multi-view attention
- speech enhancement
- time domain
- u-net
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering