Relay dueling network for visual tracking with broad field-of-view

Yifan Jiang, David K. Han, Hanseok Ko

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)


A deep reinforcement-learning-based method is presented for visual object tracking tasks. The key objective is to generate a sequence of actions which can move or scale the bounding box in the previous frame to track the target in the current frame. Two intelligent agents are trained to accomplish the above task with a special dueling deep Q-learning network (Dueling DQN), referred to as a relay dueling network. The proposed model is divided into two agents: the movement agent and the scaling agent. The former performs horizontal or vertical movements and the latter generates scaling actions to change the size of the bounding box. The model has multiple inputs that cover both the bounding box region and the enlarged search region to improve the agents’ perception of the surroundings. The proposed method has a broader field of vision than other similar trackers and its distribution of actions makes it easy to train and improve its tracking performance. The proposed network is tested on popular standard tracker benchmarks and its performance is compared with state-of-the-art trackers. The proposed network is found to be competitive in tracking accuracy and execution effectiveness when compared to conventional methods.

Original languageEnglish
Pages (from-to)615-622
Number of pages8
JournalIET Computer Vision
Issue number7
Publication statusPublished - 2019 Oct 1

Bibliographical note

Funding Information:
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2017R1A2B4012720).

Publisher Copyright:
© The Institution of Engineering and Technology 2019

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition


Dive into the research topics of 'Relay dueling network for visual tracking with broad field-of-view'. Together they form a unique fingerprint.

Cite this