TY - JOUR
T1 - Visual Thinking of Neural Networks
T2 - Interactive Text to Image Synthesis
AU - Lee, Hyunhee
AU - Kim, Gyeongmin
AU - Hur, Yuna
AU - Lim, Heuiseok
N1 - Funding Information:
This work was supported in part by the Ministry of Science and ICT (MSIT), South Korea, through the Information Technology Research Center (ITRC) Support Program Supervised by the Institute for Information and Communications Technology Planning and Evaluation (IITP) under Grant IITP-2018-0-01405, and in part by the IITP through the MSIT under Grant IITP-2020-0-00368.
Publisher Copyright:
© 2013 IEEE.
PY - 2021
Y1 - 2021
N2 - Reasoning, a trait of cognitive intelligence, is regarded as a crucial ability that distinguishes humans from other species. However, neural networks now pose a challenge to this human ability. Text-to-image synthesis is a class of vision and linguistics, wherein the goal is to learn multimodal representations between the image and text features. Hence, it requires a high-level reasoning ability that understands the relationships between objects in the given text and generates high-quality images based on the understanding. Text-to-image translation can be termed as the visual thinking of neural networks. In this study, our model infers the complicated relationships between objects in the given text and generates the final image by leveraging the previous history. We define diverse novel adversarial loss functions and finally demonstrate the best one that elevates the reasoning ability of the text-to-image synthesis. Remarkably, most of our models possess their own reasoning ability. Quantitative and qualitative comparisons with several methods demonstrate the superiority of our approach.
AB - Reasoning, a trait of cognitive intelligence, is regarded as a crucial ability that distinguishes humans from other species. However, neural networks now pose a challenge to this human ability. Text-to-image synthesis is a class of vision and linguistics, wherein the goal is to learn multimodal representations between the image and text features. Hence, it requires a high-level reasoning ability that understands the relationships between objects in the given text and generates high-quality images based on the understanding. Text-to-image translation can be termed as the visual thinking of neural networks. In this study, our model infers the complicated relationships between objects in the given text and generates the final image by leveraging the previous history. We define diverse novel adversarial loss functions and finally demonstrate the best one that elevates the reasoning ability of the text-to-image synthesis. Remarkably, most of our models possess their own reasoning ability. Quantitative and qualitative comparisons with several methods demonstrate the superiority of our approach.
KW - Generative adversarial networks
KW - image generation
KW - multimodal learning
KW - multimodal representation
KW - text-to-image synthesis
UR - http://www.scopus.com/inward/record.url?scp=85104670342&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2021.3074973
DO - 10.1109/ACCESS.2021.3074973
M3 - Article
AN - SCOPUS:85104670342
SN - 2169-3536
VL - 9
SP - 64510
EP - 64523
JO - IEEE Access
JF - IEEE Access
M1 - 9410550
ER -