Abstract
An interactive video object segmentation algorithm, which takes scribble annotations on query objects as input, is proposed in this paper. We develop a deep neural network, which consists of the annotation network (A-Net) and the transfer network (T-Net). First, given user scribbles on a frame, A-Net yields a segmentation result based on the encoder-decoder architecture. Second, T-Net transfers the segmentation result bidirectionally to the other frames, by employing the global and local transfer modules. The global transfer module conveys the segmentation information in an annotated frame to a target frame, while the local transfer module propagates the segmentation information in a temporally adjacent frame to the target frame. By applying A-Net and T-Net alternately, a user can obtain desired segmentation results with minimal efforts. We train the entire network in two stages, by emulating user scribbles and employing an auxiliary loss. Experimental results demonstrate that the proposed interactive video object segmentation algorithm outperforms the state-of-the-art conventional algorithms. Codes and models are available at https://github.com/yuk6heo/IVOS-ATNet.
Original language | English |
---|---|
Title of host publication | Computer Vision – ECCV 2020 - 16th European Conference, 2020, Proceedings |
Editors | Andrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 297-313 |
Number of pages | 17 |
ISBN (Print) | 9783030585198 |
DOIs | |
Publication status | Published - 2020 |
Event | 16th European Conference on Computer Vision, ECCV 2020 - Glasgow, United Kingdom Duration: 2020 Aug 23 → 2020 Aug 28 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 12362 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 16th European Conference on Computer Vision, ECCV 2020 |
---|---|
Country/Territory | United Kingdom |
City | Glasgow |
Period | 20/8/23 → 20/8/28 |
Bibliographical note
Funding Information:Acknowledgements. This work was supported in part by ‘The Cross-Ministry Giga KOREA Project’ grant funded by the Korea government (MSIT) (No. GK20P0200, Development of 4D reconstruction and dynamic deformable action model based hyper-realistic service technology), in part by Institute of Information & communications Technology Planning & evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-01441, Artificial Intelligence Convergence Research Center (Chungnam National University)) and in part by the National Research Foundation of Korea (NRF) through the Korea Government (MSIP) under Grant NRF-2018R1A2B3003896.
Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
Keywords
- Deep learning
- Interactive segmentation
- Video object segmentation
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science