Reducing Adversarial Failures in Neural Networks Using "None of the Above" Class Priors

Abstract

While machine learning presents an opportunity for increased automation in systems, machine-learning models are also subject to adversarial attacks. This thesis builds on previous methods for securing against adversarial examples by training a model with a None of the Above (NOTA) class. While classification models force categorization into one of a fixed number of classes, NOTA models implement an additional class allowing for the notion that some inputs will not match any of the given classes. While previous methods are largely successful in providing state of the art adversarial robustness, they are less successful against some of the more complex adversarial attack vectors. This thesis aims to increase adversarial robustness through a prior that biases predictions to be the NOTA class. We conduct a validation grid search to find the prior probability for a NOTA class over the CIFAR-10 image dataset that best decreases adversarial success. Through this work, we are able to provide a proof-of-concept that the addition of a NOTA-biased prior can decrease the adversarial success of some of the more complex evasion attacks. As the DOD moves to increase its use of machine learning models, these results will be increasingly important towards building models with adequate security.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Dec 01, 2023
Accession Number: AD1225432

Entities

People

Alexi N. Mendolia

Organizations

Naval Postgraduate School

Reducing Adversarial Failures in Neural Networks Using "None of the Above" Class Priors

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas