Detecting and Defending against Different Families of Adversarial Example Attacks

Abstract

Adversarial example attacks alter an image so the image appears largely unaltered to human eyes, but image-recognition models will misclassify it. This is a common type of attack, against which there is currently no good general defense. Most state-of-the-art methods of detecting adversarial example attacks only consistently succeed in recognizing a few known attacks. These defenses do not generalize well to detecting other attacks, which means an adversary only needs to change their attack to leave us without robust abilities to detect attacks. Military intelligence increasingly relies on machine learning image recognition for analyzing satellite images. Finding defenses against these adversarial example attacks is important for ensuring our intelligence-gathering capabilities are not compromised. This thesis seeks to contribute models which will push the state of the art towards successful recognition of adversarial attacks regardless of which type of attack was used. Models we named 3-Mix were trained using combinations of different attacked images; other models were trained using SaliencyMix. These defenses were evaluated against ten attacks: PGD, auto-PGD, autoattack, square, Carlini L2 and L-inf, deepfool, elasticnet, JSMA, and boundary. On average the attack success rate against the best defense model was 0.12 for 3-Mix, 0.31 for SaliencyMix, and 0.77 for comparison model Mixup.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jun 01, 2023
Accession Number: AD1213515

Entities

People

Shaun Kallis

Organizations

Naval Postgraduate School

Detecting and Defending against Different Families of Adversarial Example Attacks

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas