Video source identification using machine learning: A case study of 16 instant messaging applications

  • Hyomin Yang
  • , Junho Kim
  • , Jungheum Park*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

In recent years, there has been a notable increase in the prevalence of cybercrimes related to video content, including the distribution of illegal videos and the sharing of copyrighted material. This has led to the growing importance of identifying the source of video files to trace the owner of the files involved in the incident or identify the distributor. Previous research has concentrated on revealing the device (brand and/or model) that “originally” created a video file. This has been achieved by analysing the pattern noise generated by the image sensor in the camera, the storage structural features of the file, and the metadata patterns. However, due to the widespread use of mobile environments, instant messaging applications (IMAs) such as Telegram and Wire have been utilized to share illegal videos, which can result in the loss of information from the original file due to re-encoding at the application level, depending on the transmission settings. Consequently, it is necessary to extend the scope of existing research to identify the various applications that are capable of re-encoding video files in transit. Furthermore, it is essential to determine whether there are features that can be leveraged to distinguish them from the source identification perspective. In this paper, we propose a machine learning-based methodology for classifying the source application by extracting various features stored in the storage format and internal metadata of video files. To conduct this study, we analyzed 16 IMAs that are widely used in mobile environments and generated a total of 1974 sample videos, taking into account both the transmission options and encoding settings offered by each IMA. The training and testing results on this dataset indicate that the ExtraTrees model achieved an identification accuracy of approximately 99.96 %. Furthermore, we developed a proof-of-concept tool based on the proposed method, which extracts the suggested features from videos and queries a pre-trained model. This tool is released as open-source software for the community.

Original languageEnglish
Article number301812
JournalForensic Science International: Digital Investigation
Volume50
DOIs
Publication statusPublished - 2024 Oct

Bibliographical note

Publisher Copyright:
© 2024

Keywords

  • AVC
  • Digital forensics
  • HEVC
  • Machine learning
  • Multimedia forensics
  • Source identification

ASJC Scopus subject areas

  • Pathology and Forensic Medicine
  • Information Systems
  • Computer Science Applications
  • Medical Laboratory Technology
  • Law

Fingerprint

Dive into the research topics of 'Video source identification using machine learning: A case study of 16 instant messaging applications'. Together they form a unique fingerprint.

Cite this