Explanations can be manipulated and geometry is to blame

  • Ann Kathrin Dombrowski
  • , Maximilian Alber
  • , Christopher J. Anders
  • , Marcel Ackermann
  • , Klaus Robert Müller
  • , Pan Kessel

Research output: Contribution to journalConference articlepeer-review

Abstract

Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume32
Publication statusPublished - 2019
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada
Duration: 2019 Dec 82019 Dec 14

Bibliographical note

Publisher Copyright:
© 2019 Neural information processing systems foundation. All rights reserved.

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Explanations can be manipulated and geometry is to blame'. Together they form a unique fingerprint.

Cite this