Abstract
This paper describes a diffusion model for co-speech gesture generation presented by KU-ISPL entry of the GENEA Challenge 2023. We formulate the gesture generation problem as a co-speech gesture generation problem and a semantic gesture generation problem, and we focus on solving the co-speech gesture generation problem by denoising diffusion probabilistic model with text, audio, and pre-pose conditions. We use the U-Net with cross-attention architecture as a denoising model, and we propose a gesture autoencoder as a mapping function from the gesture domain to the latent domain. The collective evaluation released by GENEA Challenge 2023 shows that our model successfully generates co-speech gestures. Our system receives a mean human-likeness score of 32.0, a preference-matched score of appropriateness for the main agent speech of 53.6%, and an interlocutor speech appropriateness score of 53.5%. We also conduct an ablation study to measure the effects of the pre-pose. By the results, our system contributes to the co-speech gesture generation for natural interaction.
Original language | English |
---|---|
Title of host publication | ICMI 2023 Companion - Companion Publication of the 25th International Conference on Multimodal Interaction |
Publisher | Association for Computing Machinery |
Pages | 220-227 |
Number of pages | 8 |
ISBN (Electronic) | 9798400703218 |
DOIs | |
Publication status | Published - 2023 Oct 9 |
Event | 25th International Conference on Multimodal Interaction, ICMI 2023 Companion - Paris, France Duration: 2023 Oct 9 → 2023 Oct 13 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Conference
Conference | 25th International Conference on Multimodal Interaction, ICMI 2023 Companion |
---|---|
Country/Territory | France |
City | Paris |
Period | 23/10/9 → 23/10/13 |
Bibliographical note
Publisher Copyright:© 2023 ACM.
Keywords
- GENEA Challenge
- co-speech gesture generation
- diffusion
- generative models
- neural networks
ASJC Scopus subject areas
- Human-Computer Interaction
- Computer Networks and Communications
- Computer Vision and Pattern Recognition
- Software