FERNIE-ViL: Facial Expression Enhanced Vision-And-Language Model

Soo Ryeon Lee, Dohyun Kim, Mingyu Lee, Sangkeun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Visual cognition requires analyzing actions, intentions, and emotions of persons in a given image. Visual Commonsense Reasoning (VCR) is a task that selects rationales and answers to questions for given images. In VCR, facial expressions are important nonverbal signals because they convey emotions and intentions in human interactions. However, ERNIE-ViL and UNITER, which are vision-And-language models to get image and text representations, do not learn them. We find that ERNIE-ViL and UNITER are vulnerable to the problem of identifying emotions. In this paper, therefore, we propose facial expression recognition FERNIE-ViL, which adapts a facial expression recognition module to the existing vision-And-language model. Experimental results (2.4% point improvement on VCR Q?A and 0.3% point improvement on VCR QA?R) demonstrate that our method can enhance visual commonsense reasoning by understanding human interactions.

Original languageEnglish
Title of host publicationProceedings of 2021 IEEE 20th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2021
EditorsYingxu Wang, Jane Z. Wang, Henry Leung, Newton Howard, Paolo Soda, Bernard Widrow, Jerome Feldman
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages228-234
Number of pages7
ISBN (Electronic)9781665421195
DOIs
Publication statusPublished - 2021
Event20th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2021 - Banff, Canada
Duration: 2021 Oct 292021 Oct 31

Publication series

NameProceedings of 2021 IEEE 20th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2021

Conference

Conference20th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2021
Country/TerritoryCanada
CityBanff
Period21/10/2921/10/31

Keywords

  • Artificial Intelligence
  • Commonsense Reasoning
  • Facial Expression
  • Machine Commonsense
  • Multi-modal
  • Natural Language Processing
  • Visual Recognition

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Information Systems
  • Cognitive Neuroscience

Fingerprint

Dive into the research topics of 'FERNIE-ViL: Facial Expression Enhanced Vision-And-Language Model'. Together they form a unique fingerprint.

Cite this