Abstract
Multimodal Abstractive Summarization is a challenging task that aims to generate concise and informative summaries from diverse modalities, such as video, audio, and text. In this study, we propose VATMAN, a novel approach for multimodal abstractive summarization. To effectively capture the hierarchical relationships and dependencies between modalities, we introduce Trimodal Hierarchical Multi-head Attention (THMA). THMA hierarchically attends to the video, audio, and textual representations, enabling the model to distill salient information and generate cohesive and coherent summaries. VATMAN leverages state-of-the-art generative pretrained language models (GPLMs), specifically Transformer-based models, and applies hierarchical attention at the modality level, which enhances the utilization of contextual information. The proposed VATMAN model on the How2 dataset demonstrates the ability to create more fluent summaries than those generated by human authors, showcasing its potential for utilization in various industrial environments.
Original language | English |
---|---|
Title of host publication | ICTC 2023 - 14th International Conference on Information and Communication Technology Convergence |
Subtitle of host publication | Exploring the Frontiers of ICT Innovation |
Publisher | IEEE Computer Society |
Pages | 1475-1478 |
Number of pages | 4 |
ISBN (Electronic) | 9798350313277 |
DOIs | |
Publication status | Published - 2023 |
Event | 14th International Conference on Information and Communication Technology Convergence, ICTC 2023 - Jeju Island, Korea, Republic of Duration: 2023 Oct 11 → 2023 Oct 13 |
Publication series
Name | International Conference on ICT Convergence |
---|---|
ISSN (Print) | 2162-1233 |
ISSN (Electronic) | 2162-1241 |
Conference
Conference | 14th International Conference on Information and Communication Technology Convergence, ICTC 2023 |
---|---|
Country/Territory | Korea, Republic of |
City | Jeju Island |
Period | 23/10/11 → 23/10/13 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- Abstractive Summarization
- Generative Pretrained Language Model
- Transformer
- Trimodal
ASJC Scopus subject areas
- Information Systems
- Computer Networks and Communications