Abstract
Counterfactual explanation (CFE) provides actionable counterexamples and enhances the interpretability of the decision boundaries in deep neural networks and thereby has gained increasing interest in recent years. An ideal CFE should provide both plausible and practical examples that can alter the decision of a classifier as a plausible CFE grounded in the real world. Motivated by this issue, we propose a CFE framework for identifying related features (CIRF) to improve the plausibility of explanations. CIRF comprises the following two steps: i) searching for the direction vectors that contain class information; ii) investigating an optimal point using a projection-point, which determines the magnitude of manipulation along the direction. Our framework utilizes related features and the property of a latent space in a generative model, thereby highlighting the importance of related features. We derive points that have many related features, and show a performance gain of more than 11% on the IM1 metric compared to points that have fewer related features. We validate the versatility of CIRF by performing experiments using various domains and datasets, and the two interchangeable steps. CIRF exhibits remarkable performance in terms of plausibility across various domains, including tabular and image datasets.
Original language | English |
---|---|
Article number | 120974 |
Journal | Information Sciences |
Volume | 678 |
DOIs | |
Publication status | Published - 2024 Sept |
Bibliographical note
Publisher Copyright:© 2024
Keywords
- Counterfactual explanation
- Explainable artificial intelligence
- Generative adversarial networks
- Generative neural networks
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence