Abstract
KU-ISPL system for TRECVID 2016 Multimedia Event Detection (MED) is presented in this paper. The deep learning-based local descriptors extract heterogeneous metadata collected from input in a frame-by-frame manner. It consists of Acoustic Scene Analysis (ASA), Visual Scene Analysis (VSA), Visual Motion Analysis (VMA), and Subtitle Information Analysis (SIA). Such metadata can be modeled through the deep learning or the statistical modeling based Event Query Generation (EQG) process depending on the types of metadata. In addition, since the different characteristic of the multimedia events hinders detection performance, a fusion process which combines those various metadata effectively is regarded as significant enhancement factor for MED. Hence, mitigating the detection problem, the system that performs not only elaborate metadata data extraction but also two-fold-constructed metadata fusion method is proposed. Unlike the conventional fusion approach, it is composed of Adaptive Metadata Weighting (AMW) and Dynamic Feature Selection (DFS). It applies selective metadata components to the corresponding multimedia event adaptively so the property of multimedia events can be reflected to the detection system adequately. The experimental results using HAVIC and YFCC set from TREC Video Retrieval Evaluation (TRECVID) 2016 demonstrate that the proposed system attains improved MED performance compared to the conventional approaches as it recorded remarkable higher score in both self-assessment and official evaluation of TRECVID 2016.
Original language | English |
---|---|
Publication status | Published - 2016 |
Event | 2016 TREC Video Retrieval Evaluation, TRECVID 2016 - Gaithersburg, United States Duration: 2016 Nov 14 → 2016 Nov 16 |
Conference
Conference | 2016 TREC Video Retrieval Evaluation, TRECVID 2016 |
---|---|
Country/Territory | United States |
City | Gaithersburg |
Period | 16/11/14 → 16/11/16 |
ASJC Scopus subject areas
- Information Systems
- Signal Processing
- Electrical and Electronic Engineering