Graph-based methods coupled with specific distributional distances for adversarial attack detection

Artificial neural networks are prone to being fooled by carefully perturbed inputs which cause an egregious misclassification. These adversarial attacks have been the focus of extensive research. Likewise, there has been an abundance of research in ways to detect and defend against them. We introduce a novel approach of detection and interpretation of adversarial attacks from a graph perspective. For an input image, we compute an associated sparse graph using the layer-wise relevance propagation algorithm (Bach et al., 2015). Specifically, we only keep edges of the neural network with the highest relevance values. Three quantities are then computed from the graph which are then compared against those computed from the training set. The result of the comparison is a classification of the image as benign or adversarial. To make the comparison, two classification methods are introduced: (1) an explicit formula based on Wasserstein distance applied to the degree of node and (2) a logistic regression. Both classification methods produce strong results which lead us to believe that a graph-based interpretation of adversarial attacks is valuable.

Mots clés

Artificial neural network Graph theory Deep learning Machine learning Wasserstein Bio-inspired

Domaines

Intelligence artificielle [cs.AI]

Sandrine Corvey-Biron : Connectez-vous pour contacter le contributeur

https://hal.univ-grenoble-alpes.fr/hal-04421207

Soumis le : samedi 27 janvier 2024-13:37:34

Dernière modification le : lundi 9 décembre 2024-03:23:03

Dates et versions

hal-04421207 , version 1 (27-01-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04421207 , version 1
ARXIV : 2306.00042
DOI : 10.1016/j.neunet.2023.10.007

Citer

Dwight Nwaigwe, Lucrezia Carboni, Martial Mermillod, Sophie Achard, Michel Dojat. Graph-based methods coupled with specific distributional distances for adversarial attack detection. Neural Networks, 2024, 169, pp.11-19. ⟨10.1016/j.neunet.2023.10.007⟩. ⟨hal-04421207⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-SAVOIE UGA CNRS INRIA INSMI LJK LJK_PS LPNC INRIA2 MIAI LJK-PS-STATIFY ANR GRENOBLEINSTITUTNEUROSCIENCES

107 Consultations

0 Téléchargements