Training-free retrieval-based log anomaly detection with pre-trained language model considering token-level information

Gunho No, Yukyung Lee, Hyeongwon Kang, Pilsung Kang

    Research output: Contribution to journalArticlepeer-review

    2 Citations (Scopus)

    Abstract

    As the information technology industry advances, the demand for log anomaly detection, based solely on printed log text, is growing. However, identifying anomalies in rapidly accumulating logs remains a challenging task. Traditional anomaly detection models require dataset-specific training, leading to corresponding delays. Notably, most methods only focus on sequence-level log information, complicating the detection of subtle anomalies, and often involve inference processes that are difficult to utilize in real-time. We introduce a new retrieval-based log anomaly detection model, capitalizing on the inherent features of log data for real-time anomaly detection. Our model treats logs as natural language, extracting representations with pre-trained language models. Categorizing logs based on system context, we implement a retrieval-based reformulation to contrast test logs with the most similar normal logs. This strategy not only obviates the need for log-specific training but also incorporates token-level information, ensuring refined detection, particularly for unseen logs. We also propose the core set technique, reducing computational costs for comparison. In our experiments on three representative benchmarks, we obtained an average f1-score of 0.9738, demonstrating that our model performs competitively with existing models without training on log data. Through various research questions, we verified real-world usability, including real-time detection.

    Original languageEnglish
    Article number108613
    JournalEngineering Applications of Artificial Intelligence
    Volume133
    DOIs
    Publication statusPublished - 2024 Jul

    Bibliographical note

    Publisher Copyright:
    © 2024 Elsevier Ltd

    Keywords

    • Log anomaly detection
    • Log data
    • Retrieval

    ASJC Scopus subject areas

    • Control and Systems Engineering
    • Artificial Intelligence
    • Electrical and Electronic Engineering

    Fingerprint

    Dive into the research topics of 'Training-free retrieval-based log anomaly detection with pre-trained language model considering token-level information'. Together they form a unique fingerprint.

    Cite this