MILA: Multi-Task Learning from Videos via Efficient Inter-Frame Attention

  • Donghyun Kim
  • , Tian Lan
  • , Chuhang Zou
  • , Ning Xu
  • , Bryan A. Plummer
  • , Stan Sclaroff
  • , Jayan Eledath
  • , Gerard Medioni

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Prior work in multi-task learning has mainly focused on predictions on a single image. In this work, we present a new approach for multi-task learning from videos via efficient inter-frame local attention (MILA). Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames. We embed the attention module in a "slow-fast"architecture, where the slow network runs on sparsely sampled keyframes and the fast shallow network runs on non-keyframes at a high frame rate. We also propose an effective adversarial learning strategy to encourage the slow and fast net-work to learn similar features to well align keyframes and non-keyframes. Our approach ensures low-latency multi-task learning while maintaining high quality predictions. MILA obatins competitive accuracy compared to state-of-the-art on two multi-task learning benchmarks while reducing the number of floating point operations (FLOPs) by up to 70%. In addition, our attention based feature propagation method (ILA) outperforms prior work in terms of task accuracy while also reducing up to 90% of FLOPs.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2219-2229
Number of pages11
ISBN (Electronic)9781665401913
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event18th IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021 - Virtual, Online, Canada
Duration: 2021 Oct 112021 Oct 17

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2021-October
ISSN (Print)1550-5499
ISSN (Electronic)2380-7504

Conference

Conference18th IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021
Country/TerritoryCanada
CityVirtual, Online
Period21/10/1121/10/17

Bibliographical note

Publisher Copyright:
© 2021 IEEE.

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'MILA: Multi-Task Learning from Videos via Efficient Inter-Frame Attention'. Together they form a unique fingerprint.

Cite this