Detecting and Defending against Different Families of Adversarial Example Attacks
Abstract
Adversarial example attacks alter an image so the image appears largely unaltered to human eyes, but image-recognition models will misclassify it. This is a common type of attack, against which there is currently no good general defense. Most state-of-the-art methods of detecting adversarial example attacks only consistently succeed in recognizing a few known attacks. These defenses do not generalize well to detecting other attacks, which means an adversary only needs to change their attack to leave us without robust abilities to detect attacks. Military intelligence increasingly relies on machine learning image recognition for analyzing satellite images. Finding defenses against these adversarial example attacks is important for ensuring our intelligence-gathering capabilities are not compromised. This thesis seeks to contribute models which will push the state of the art towards successful recognition of adversarial attacks regardless of which type of attack was used. Models we named 3-Mix were trained using combinations of different attacked images; other models were trained using SaliencyMix. These defenses were evaluated against ten attacks: PGD, auto-PGD, autoattack, square, Carlini L2 and L-inf, deepfool, elasticnet, JSMA, and boundary. On average the attack success rate against the best defense model was 0.12 for 3-Mix, 0.31 for SaliencyMix, and 0.77 for comparison model Mixup.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jun 01, 2023
- Accession Number
- AD1213515
Entities
People
- Shaun Kallis
Organizations
- Naval Postgraduate School