Abstract
As the information technology industry advances, the demand for log anomaly detection, based solely on printed log text, is growing. However, identifying anomalies in rapidly accumulating logs remains a challenging task. Traditional anomaly detection models require dataset-specific training, leading to corresponding delays. Notably, most methods only focus on sequence-level log information, complicating the detection of subtle anomalies, and often involve inference processes that are difficult to utilize in real-time. We introduce a new retrieval-based log anomaly detection model, capitalizing on the inherent features of log data for real-time anomaly detection. Our model treats logs as natural language, extracting representations with pre-trained language models. Categorizing logs based on system context, we implement a retrieval-based reformulation to contrast test logs with the most similar normal logs. This strategy not only obviates the need for log-specific training but also incorporates token-level information, ensuring refined detection, particularly for unseen logs. We also propose the core set technique, reducing computational costs for comparison. In our experiments on three representative benchmarks, we obtained an average f1-score of 0.9738, demonstrating that our model performs competitively with existing models without training on log data. Through various research questions, we verified real-world usability, including real-time detection.
Original language | English |
---|---|
Article number | 108613 |
Journal | Engineering Applications of Artificial Intelligence |
Volume | 133 |
DOIs | |
Publication status | Published - 2024 Jul |
Bibliographical note
Publisher Copyright:© 2024 Elsevier Ltd
Keywords
- Log anomaly detection
- Log data
- Retrieval
ASJC Scopus subject areas
- Control and Systems Engineering
- Artificial Intelligence
- Electrical and Electronic Engineering