Improving Vision Transformer with Multi-Task Training

Woo Jin Ahn, Geun Yeong Yang, Hyun Duck Choi, Myo Taeg Lim, Tae Koo Kang

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    Self-supervised learning methods have shown excellent performance in improving the performance of existing networks by learning visual representations from large amounts of unlabeled data. In this paper, we propose a end-to-end multi-task self-supervision method for vision transformer. The network is given two task: inpainting, position prediction. Given a masked image, the network predicts the missing pixel information and also predicts the position of the given puzzle patches. Through classification experiment, we demonstrate that the proposed method improves performance of the network compared to the direct supervised learning method.

    Original languageEnglish
    Title of host publication2022 22nd International Conference on Control, Automation and Systems, ICCAS 2022
    PublisherIEEE Computer Society
    Pages1963-1965
    Number of pages3
    ISBN (Electronic)9788993215243
    DOIs
    Publication statusPublished - 2022
    Event22nd International Conference on Control, Automation and Systems, ICCAS 2022 - Busan, Korea, Republic of
    Duration: 2022 Nov 272022 Dec 1

    Publication series

    NameInternational Conference on Control, Automation and Systems
    Volume2022-November
    ISSN (Print)1598-7833

    Conference

    Conference22nd International Conference on Control, Automation and Systems, ICCAS 2022
    Country/TerritoryKorea, Republic of
    CityBusan
    Period22/11/2722/12/1

    Bibliographical note

    Publisher Copyright:
    © 2022 ICROS.

    Keywords

    • Deep Learning
    • Self-Supervision
    • Vision Transformer

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Computer Science Applications
    • Control and Systems Engineering
    • Electrical and Electronic Engineering

    Fingerprint

    Dive into the research topics of 'Improving Vision Transformer with Multi-Task Training'. Together they form a unique fingerprint.

    Cite this