FERNIE-ViL: Facial Expression Enhanced Vision-And-Language Model

Soo Ryeon Lee, Dohyun Kim, Mingyu Lee, Sangkeun Lee

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Visual cognition requires analyzing actions, intentions, and emotions of persons in a given image. Visual Commonsense Reasoning (VCR) is a task that selects rationales and answers to questions for given images. In VCR, facial expressions are important nonverbal signals because they convey emotions and intentions in human interactions. However, ERNIE-ViL and UNITER, which are vision-And-language models to get image and text representations, do not learn them. We find that ERNIE-ViL and UNITER are vulnerable to the problem of identifying emotions. In this paper, therefore, we propose facial expression recognition FERNIE-ViL, which adapts a facial expression recognition module to the existing vision-And-language model. Experimental results (2.4% point improvement on VCR Q?A and 0.3% point improvement on VCR QA?R) demonstrate that our method can enhance visual commonsense reasoning by understanding human interactions.

    Original languageEnglish
    Title of host publicationProceedings of 2021 IEEE 20th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2021
    EditorsYingxu Wang, Jane Z. Wang, Henry Leung, Newton Howard, Paolo Soda, Bernard Widrow, Jerome Feldman
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages228-234
    Number of pages7
    ISBN (Electronic)9781665421195
    DOIs
    Publication statusPublished - 2021
    Event20th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2021 - Banff, Canada
    Duration: 2021 Oct 292021 Oct 31

    Publication series

    NameProceedings of 2021 IEEE 20th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2021

    Conference

    Conference20th IEEE International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2021
    Country/TerritoryCanada
    CityBanff
    Period21/10/2921/10/31

    Bibliographical note

    Publisher Copyright:
    © 2021 IEEE.

    Keywords

    • Artificial Intelligence
    • Commonsense Reasoning
    • Facial Expression
    • Machine Commonsense
    • Multi-modal
    • Natural Language Processing
    • Visual Recognition

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Computer Science Applications
    • Information Systems
    • Cognitive Neuroscience

    Fingerprint

    Dive into the research topics of 'FERNIE-ViL: Facial Expression Enhanced Vision-And-Language Model'. Together they form a unique fingerprint.

    Cite this