Abstract
Visual cognition requires analyzing actions, intentions, and emotions of persons in a given image. Visual Commonsense Reasoning (VCR) is a task that selects rationales and answers to questions for given images. In VCR, facial expressions are important nonverbal signals because they convey emotions and intentions in human interactions. However, ERNIE-ViL and UNITER, which are vision-And-language models to get image and text representations, do not learn them. We find that ERNIE-ViL and UNITER are vulnerable to the problem of identifying emotions. In this paper, therefore, we propose facial expression recognition FERNIE-ViL, which adapts a facial expression recognition module to the existing vision-And-language model. Experimental results (2.4% point improvement on VCR Q?A and 0.3% point improvement on VCR QA?R) demonstrate that our method can enhance visual commonsense reasoning by understanding human interactions.
Original language | English |
---|---|
Title of host publication | Proceedings of 2021 IEEE 20th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2021 |
Editors | Yingxu Wang, Jane Z. Wang, Henry Leung, Newton Howard, Paolo Soda, Bernard Widrow, Jerome Feldman |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 228-234 |
Number of pages | 7 |
ISBN (Electronic) | 9781665421195 |
DOIs | |
Publication status | Published - 2021 |
Event | 20th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2021 - Banff, Canada Duration: 2021 Oct 29 → 2021 Oct 31 |
Publication series
Name | Proceedings of 2021 IEEE 20th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2021 |
---|
Conference
Conference | 20th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2021 |
---|---|
Country/Territory | Canada |
City | Banff |
Period | 21/10/29 → 21/10/31 |
Bibliographical note
Publisher Copyright:© 2021 IEEE.
Keywords
- Artificial Intelligence
- Commonsense Reasoning
- Facial Expression
- Machine Commonsense
- Multi-modal
- Natural Language Processing
- Visual Recognition
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications
- Information Systems
- Cognitive Neuroscience