KU-ISPL TRECVID 2017 VTT system

Daehun Kim, Joungmin Beh, Youngseng Chen, Hanseok Ko

Research output: Contribution to conferencePaperpeer-review

1 Citation (Scopus)

Abstract

KU-ISPL system for TRECVID 2017 Video to Text (VTT) is presented in this paper. The main method of the system is a stacked LSTM model for sentence generation. Input descriptors of the system consist of various deep learning-based features and multi-object detection results to obtain diversity of characteristics and key information from videos. We choose mid-level features of VGGnet and SoundNet as major features to capture multimodality about image and acoustic. Additionally, the visual attribution about objects and places is used for high-level feature. Finally, visual syntax detection is fine-tuned by sigmoid loss function for finding key words. We make 4 runs for the stacked LSTM model by combining various types of features to see how the information impacts the performance of sentence generation. Word2Vec is adopted for effective encoding of sentences. The embedded words by Word2Vec are used at state value and target of the LSTM. On the other side, the sentence matching method is based on the fusion score of Meteor, Bleu and the detection. The output of detection represents the probability that a word exists. Because the TRECVID VTT task is open domain, the sentence generation and sentence matching system is trained by various database such as MSVD, MPII-MD, MVAD, MSR-VTT, and TRECVID-VTT 2016.

Original languageEnglish
Publication statusPublished - 2017
Event2017 TREC Video Retrieval Evaluation, TRECVID 2017 - Gaithersburg, United States
Duration: 2017 Nov 132017 Nov 15

Conference

Conference2017 TREC Video Retrieval Evaluation, TRECVID 2017
Country/TerritoryUnited States
CityGaithersburg
Period17/11/1317/11/15

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'KU-ISPL TRECVID 2017 VTT system'. Together they form a unique fingerprint.

Cite this