Abstract
Recent advances in self-supervised learning (SSL) have proven crucial in effectively learning representations of unstructured data, encompassing text, images, and audio. Although the applications of these advances in anomaly detection have been explored extensively, applying SSL to tabular data presents challenges because of the absence of prior information on data structure. In response, we propose a framework for anomaly detection in tabular datasets using variable corruption. Through selective variable corruption and assignment of new labels based on the degree of corruption, our framework can effectively distinguish between normal and abnormal data. Furthermore, analyzing the impact of corruption on anomaly scores aids in the identification of important variables. Experimental results obtained from various tabular datasets validate the precision and applicability of the proposed method. The source code can be accessed at https://github.com/mokch/CAIT.
| Original language | English |
|---|---|
| Article number | 111149 |
| Journal | Pattern Recognition |
| Volume | 159 |
| DOIs | |
| Publication status | Published - 2025 Mar |
Bibliographical note
Publisher Copyright:© 2024
Keywords
- Anomaly detection
- Explainable artificial intelligence
- Self-supervised learning
- Tabular data
- Variable corruption
ASJC Scopus subject areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence