Training-free retrieval-based log anomaly detection with pre-trained language model considering token-level information

Gunho No, Yukyung Lee, Hyeongwon Kang, Pilsung Kang

Research output: Contribution to journalArticlepeer-review


As the information technology industry advances, the demand for log anomaly detection, based solely on printed log text, is growing. However, identifying anomalies in rapidly accumulating logs remains a challenging task. Traditional anomaly detection models require dataset-specific training, leading to corresponding delays. Notably, most methods only focus on sequence-level log information, complicating the detection of subtle anomalies, and often involve inference processes that are difficult to utilize in real-time. We introduce a new retrieval-based log anomaly detection model, capitalizing on the inherent features of log data for real-time anomaly detection. Our model treats logs as natural language, extracting representations with pre-trained language models. Categorizing logs based on system context, we implement a retrieval-based reformulation to contrast test logs with the most similar normal logs. This strategy not only obviates the need for log-specific training but also incorporates token-level information, ensuring refined detection, particularly for unseen logs. We also propose the core set technique, reducing computational costs for comparison. In our experiments on three representative benchmarks, we obtained an average f1-score of 0.9738, demonstrating that our model performs competitively with existing models without training on log data. Through various research questions, we verified real-world usability, including real-time detection.

Original languageEnglish
Article number108613
JournalEngineering Applications of Artificial Intelligence
Publication statusPublished - 2024 Jul

Bibliographical note

Publisher Copyright:
© 2024 Elsevier Ltd


  • Log anomaly detection
  • Log data
  • Retrieval

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering


Dive into the research topics of 'Training-free retrieval-based log anomaly detection with pre-trained language model considering token-level information'. Together they form a unique fingerprint.

Cite this