Systematically Studying Backdoor Attacks on DNNs and Developing a Detection Architecture

Abstract

Major Goals: Backdoor attacks [1][2] and Trojaning attacks [3] on DNNs are particularly two threatening attacks in the above settings. The goal of these attacks is to generate a backdoored neural network, which produces normal outputs on normal data, but wrong outputs on data embedded with backdoor keys. In image classifications, for example, the produced wrong outputs can be either targeted or non-targeted. The backdoor key can be a small predefined patch overlaid on a normal input image or even a special physical item that appears in a photo. Our preliminary experiments showed that when such a backdoored neural network is deployed in a face-recognition system or an autonomous car, attackers can easily fool the systems by attaching a predefined sticker on a human's face or a road sign. Particularly, implementing such attacks do not require to change the training process or the structure of the trained model; instead, only a small portion of the training data needs to be poisoned/perturbed. Existing defense methods of backdoor and poisoning attacks focus on building a robust training algorithm so that the trained models can either resist or ignore the poisoned samples, including some robust linear regression models. Moreover, these methods either require accesses to the training process or are limited to non-deep learning algorithms. For example, Auror [4] proposed to defend against poisoning attacks during the training phase of DNNs. However, the technique cannot detect the backdoors of a pre-trained model. Jagielski et al. [5] proposed a robust defense algorithm against poisoning attacks that can protect only linear regression models. In parallel to backdoor defenses, DNN verification is widely studied in the contexts such as medical diagnosis and autonomous driving to find the undesirable corner cases of a given DNN model.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 03, 2022
Accession Number
AD1201331

Entities

People

  • Hai Helen Li

Organizations

  • Duke University

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automated Speech Recognition
  • Computer Languages
  • Deep Learning
  • Detection
  • Detectors
  • Dimensionality Reduction
  • Engineering
  • Fish
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Neural Networks
  • Probability
  • Reverse Engineering
  • Supervised Machine Learning
  • Unmanned Vehicles

Fields of Study

  • Computer science

Readers

  • Cybersecurity.
  • Educational Psychology
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Neural Networks