HotR: End-to-End Human-Object Interaction Detection with Transformers

Bumsoo Kim, Junhyun Lee, Jaewoo Kang, Eun Sol Kim, Hyunwoo J. Kim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

111 Citations (Scopus)

Abstract

Human-Object Interaction (HOI) detection is a task of identifying “a set of interactions” in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels. Most existing methods have indirectly addressed this task by detecting human and object instances and individually inferring every pair of the detected instances. In this paper, we present a novel framework, referred by HOTR, which directly predicts a set of 〈human, object, interaction〉 triplets from an image based on a transformer encoder-decoder architecture. Through the set prediction, our method effectively exploits the inherent semantic relationships in an image and does not require time-consuming post-processing which is the main bottleneck of existing methods. Our proposed algorithm achieves the state-of-the-art performance in two HOI detection benchmarks with an inference time under 1 ms after object detection.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
PublisherIEEE Computer Society
Pages74-83
Number of pages10
ISBN (Electronic)9781665445092
DOIs
Publication statusPublished - 2021
Event2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 - Virtual, Online, United States
Duration: 2021 Jun 192021 Jun 25

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Conference

Conference2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
Country/TerritoryUnited States
CityVirtual, Online
Period21/6/1921/6/25

Bibliographical note

Funding Information:
Acknowledgments. This research was partly supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grants funded by the Korea government (MSIT) (No.2021-0-00025, Development of Integrated Cognitive Drone AI for Disaster/Emergency Situations), (IITP-2021-0-01819, the ICT Creative Consilience program), and National Research Foundation of Korea (NRF2020R1A2C3010638, NRF-2016M3A9A7916996).

Publisher Copyright:
© 2021 IEEE

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'HotR: End-to-End Human-Object Interaction Detection with Transformers'. Together they form a unique fingerprint.

Cite this