Youngsaeng Jin, Junggi Kwak, Younglo Lee, Jeongseop Yun, Hanseok Ko

Research output: Contribution to conferencePaperpeer-review


KU-ISPL model for TRECVID 2018 Video-to-Text (VTT) is presented in this paper. A stack of two LSTM with attention mechanism is structured in the VTT architecture. We employ a sequence-to-sequence model to deal with sequential input and output. The encoder in our model encodes video frames into visual representations and the decoder decodes the visual representations into textual words. Attention mechanism is exploited for best use of contextually pertinent frames in input video. The model pays attention to the hidden states of 2nd LSTM in the encoder to obtain efficient hidden states in the decoder. Visual feature, acoustic feature and detection result of videos are extracted from deep learning models and the resulting features are subsequently concatenated into one. It is used for an input descriptor of the model. The stacked LSTM and attention weights are jointly trained and the whole model is an end-to-end trainable network. We proceed by making 4 runs for our model by combining various types of features to explore how the information impacts the performance of sentence generation. The sentence matching method is based on the fusion score of Meteor and Bleu. Because the TRECVID VTT task is open domain, the sentence generation and sentence matching system are trained by various database such as MSVD, MVAD, and MSR-VTT. Experimental results show that the proposed model performs better than the model without attention mechanism.

Original languageEnglish
Publication statusPublished - 2020
Event2018 TREC Video Retrieval Evaluation, TRECVID 2018 - Gaithersburg, United States
Duration: 2018 Nov 132018 Nov 15


Conference2018 TREC Video Retrieval Evaluation, TRECVID 2018
Country/TerritoryUnited States

Bibliographical note

Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by

Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2017R1A2B4012720).

Publisher Copyright:
Copyright © TRECVID 2018.All rights reserved.

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Electrical and Electronic Engineering


Dive into the research topics of 'KU-ISPL TRECVID 2018 VTT model'. Together they form a unique fingerprint.

Cite this