Reducing Adversarial Failures in Neural Networks Using "None of the Above" Class Priors

Abstract

While machine learning presents an opportunity for increased automation in systems, machine-learning models are also subject to adversarial attacks. This thesis builds on previous methods for securing against adversarial examples by training a model with a None of the Above (NOTA) class. While classification models force categorization into one of a fixed number of classes, NOTA models implement an additional class allowing for the notion that some inputs will not match any of the given classes. While previous methods are largely successful in providing state of the art adversarial robustness, they are less successful against some of the more complex adversarial attack vectors. This thesis aims to increase adversarial robustness through a prior that biases predictions to be the NOTA class. We conduct a validation grid search to find the prior probability for a NOTA class over the CIFAR-10 image dataset that best decreases adversarial success. Through this work, we are able to provide a proof-of-concept that the addition of a NOTA-biased prior can decrease the adversarial success of some of the more complex evasion attacks. As the DOD moves to increase its use of machine learning models, these results will be increasingly important towards building models with adequate security.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 2023
Accession Number
AD1225432

Entities

People

  • Alexi N. Mendolia

Organizations

  • Naval Postgraduate School

Tags

Fields of Study

  • Computer science

Readers

  • Cybersecurity.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks