Improving Vision Transformer with Multi-Task Training

Woo Jin Ahn, Geun Yeong Yang, Hyun Duck Choi, Myo Taeg Lim, Tae Koo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Self-supervised learning methods have shown excellent performance in improving the performance of existing networks by learning visual representations from large amounts of unlabeled data. In this paper, we propose a end-to-end multi-task self-supervision method for vision transformer. The network is given two task: inpainting, position prediction. Given a masked image, the network predicts the missing pixel information and also predicts the position of the given puzzle patches. Through classification experiment, we demonstrate that the proposed method improves performance of the network compared to the direct supervised learning method.

Original languageEnglish
Title of host publication2022 22nd International Conference on Control, Automation and Systems, ICCAS 2022
PublisherIEEE Computer Society
Pages1963-1965
Number of pages3
ISBN (Electronic)9788993215243
DOIs
Publication statusPublished - 2022
Event22nd International Conference on Control, Automation and Systems, ICCAS 2022 - Busan, Korea, Republic of
Duration: 2022 Nov 272022 Dec 1

Publication series

NameInternational Conference on Control, Automation and Systems
Volume2022-November
ISSN (Print)1598-7833

Conference

Conference22nd International Conference on Control, Automation and Systems, ICCAS 2022
Country/TerritoryKorea, Republic of
CityBusan
Period22/11/2722/12/1

Bibliographical note

Publisher Copyright:
© 2022 ICROS.

Keywords

  • Deep Learning
  • Self-Supervision
  • Vision Transformer

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Improving Vision Transformer with Multi-Task Training'. Together they form a unique fingerprint.

Cite this