Reducing Adversarial Failures in Neural Networks Using "None of the Above" Class Priors
Abstract
While machine learning presents an opportunity for increased automation in systems, machine-learning models are also subject to adversarial attacks. This thesis builds on previous methods for securing against adversarial examples by training a model with a None of the Above (NOTA) class. While classification models force categorization into one of a fixed number of classes, NOTA models implement an additional class allowing for the notion that some inputs will not match any of the given classes. While previous methods are largely successful in providing state of the art adversarial robustness, they are less successful against some of the more complex adversarial attack vectors. This thesis aims to increase adversarial robustness through a prior that biases predictions to be the NOTA class. We conduct a validation grid search to find the prior probability for a NOTA class over the CIFAR-10 image dataset that best decreases adversarial success. Through this work, we are able to provide a proof-of-concept that the addition of a NOTA-biased prior can decrease the adversarial success of some of the more complex evasion attacks. As the DOD moves to increase its use of machine learning models, these results will be increasingly important towards building models with adequate security.
Document Details
- Document Type
- Technical Report
- Publication Date
- Dec 01, 2023
- Accession Number
- AD1225432
Entities
People
- Alexi N. Mendolia
Organizations
- Naval Postgraduate School