Abstract
Internet memes have become an essential tool for online communication and interaction on social media platforms. They reflect a wide range of emotional and cultural elements, thereby serving as effective vectors for rapid information exchange. In this digital age, the popularity and influence of memes highlight the need for innovative communication methods. This need has resulted in the exploration of technologies that are capable of capturing and expressing this unique cultural phenomenon. This research presents an innovative Chinese meme generation system that is capable of generating contextually relevant Chinese meme-style text using input images and dynamically predicting the best location of the text within the meme images. Our approach combines the tasks of object detection and image captioning, thereby drawing on the strengths of the fully convolutional one-stage object detection (FCOS) and multimodal pretrained models, ChineseCLIP and GPT-2, respectively. To facilitate this task, we constructed a dataset comprising meme images, textual content, and location information of the text within the images. Through comprehensive experiments, we validated the proficiency of our model by not only generating contextually relevant text that corresponds with the image but also locating it to improve the aesthetic allure and communicative efficacy of the memes. This study propels the automation of Chinese meme generation and provides new perspectives on the interplay between image understanding and multimodal learning.
| Original language | English |
|---|---|
| Pages (from-to) | 151421-151434 |
| Number of pages | 14 |
| Journal | IEEE Access |
| Volume | 13 |
| DOIs | |
| Publication status | Published - 2025 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
Keywords
- Image captioning
- internet memes
- meme generation
- multimodal learning
- object detection
ASJC Scopus subject areas
- General Computer Science
- General Materials Science
- General Engineering
Fingerprint
Dive into the research topics of 'Text Location-Aware Framework for Chinese Meme Generation'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS