Abstract
This paper proposes a novel generative model called PUGAN, which progressively synthesizes high-quality audio in a raw waveform. Progressive upsampling GAN (PUGAN) leverages the progressive generation of higher-resolution output by stacking multiple encoder-decoder architectures. Compared to an existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them to a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 3.17x smaller for 16 kHz output, than WaveGAN. Our experiments show that the audio signals can be generated in real time with a comparable quality to that of WaveGAN in terms of the inception scores and human perception.
Original language | English |
---|---|
Pages (from-to) | 3410-3414 |
Number of pages | 5 |
Journal | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
Volume | 2021-June |
DOIs | |
Publication status | Published - 2021 |
Event | 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada Duration: 2021 Jun 6 → 2021 Jun 11 |
Bibliographical note
Funding Information:Acknowledgements. This work was supported by Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government (20ZS1200, Fundamental Technology Research for Human-Centric Autonomous Intelligent Systems) and Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.2019-0-00075, Artificial Intelligence Graduate School Program(KAIST)).
Publisher Copyright:
© 2021 IEEE
Keywords
- Generative adversarial networks (GANs)
- Real-time sound effect synthesis
ASJC Scopus subject areas
- Software
- Signal Processing
- Electrical and Electronic Engineering