Rectifying adversarial inputs using XAI techniques | Science of Security Virtual Organization

Rectifying adversarial inputs using XAI techniques
Author	Ching-Yu Kao Junhao Chen Karla Markert Konstantin Böttinger
Abstract	With deep neural networks (DNNs) involved in more and more decision making processes, critical security problems can occur when DNNs give wrong predictions. This can be enforced with so-called adversarial attacks. These attacks modify the input in such a way that they are able to fool a neural network into a false classification, while the changes remain imperceptible to a human observer. Even for very specialized AI systems, adversarial attacks are still hardly detectable. The current state-of-the-art adversarial defenses can be classified into two categories: pro-active defense and passive defense, both unsuitable for quick rectifications: Pro-active defense methods aim to correct the input data to classify the adversarial samples correctly, while reducing the accuracy of ordinary samples. Passive defense methods, on the other hand, aim to filter out and discard the adversarial samples. Neither of the defense mechanisms is suitable for the setup of autonomous driving: when an input has to be classified, we can neither discard the input nor have the time to go for computationally expensive corrections. This motivates our method based on explainable artificial intelligence (XAI) for the correction of adversarial samples. We used two XAI interpretation methods to correct adversarial samples. We experimentally compared this approach with baseline methods. Our analysis shows that our proposed method outperforms the state-of-the-art approaches.
Year of Publication	2022
Date Published	aug
URL	https://ieeexplore.ieee.org/document/9909699
DOI	10.23919/EUSIPCO55093.2022.9909699
Google Scholar \| BibTeX \| DOI