Intrusion Detection Based on Sequential Information Preserving Log Embedding Methods and Anomaly Detection Algorithms

Czangyeob Kim, Myeongjun Jang, Seungwan Seo, Kyeongchan Park, Pilsung Kang

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)


Previous methods for system intrusion detection have mainly consisted of those based on pattern matching that employs prior knowledge extracted from experts' domain knowledge. However, pattern matching-based methods have a major drawback that it can be bypassed through various modified techniques. These advanced persistent threats cause limitation to the pattern matching-based detecting mechanism, because they are not only more sophisticated than usual threats but also specialized in the targeted attacking object. The defense mechanism should have to comprehend unusual phenomenons or behaviors to successfully handles the advanced threats. To achieve this, various security techniques based on machine learning have been developed recently. Among these, anomaly detection algorithms, which are trained in unsupervised fashion, are capable of reducing efforts of security experts and securing labeled dataset through post analysis. It is further possible to distinguish abnormal behaviors more precisely by training classification models if sufficient amounts of labeled dataset is obtained through post analysis of anomaly detection results. In this study, we proposed an end-to-end abnormal behavior detection method based on sequential information preserving log embedding algorithms and machine learning-based anomaly detection algorithms. Contrary to other machine learning based system anomaly detection models, which borrow domain experts' knowledge to extract significant features from the log data, raw log data are transformed into a fixed size of continuous vector regardless of their length, and these vectors are used to train the anomaly detection models. Experimental results based on a real system call trace dataset, our proposed log embedding method with unsupervised anomaly detection model yielded a favorable performance, at most 0.8708 in terms of AUROC, and it can be further improved up to 0.9745 with supervised classification algorithms if sufficient labeled attack log data become available.

Original languageEnglish
Article number9399070
Pages (from-to)58088-58101
Number of pages14
JournalIEEE Access
Publication statusPublished - 2021

Bibliographical note

Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korea Government (MSIT) under Grant NRF-2019R1F1A1060338 and Grant NRF-2019R1A4A1024732.

Publisher Copyright:
© 2013 IEEE.


  • System anomaly detection
  • advanced persistent threat
  • cyber security
  • system log embedding

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering


Dive into the research topics of 'Intrusion Detection Based on Sequential Information Preserving Log Embedding Methods and Anomaly Detection Algorithms'. Together they form a unique fingerprint.

Cite this