Systematically Studying Backdoor Attacks on DNNs and Developing a Detection Architecture
Abstract
Many applications of deep neural networks (DNNs) require a large volume of data to train the DNN models. In practice, researchers often perform collaborative training with outsourced data or transfer learning based on pre-trained models. Although these methods help minimize the effort of data collection and the reliance on the training data, part of the training process will no longer be fully controlled by the users. Malicious attacks, including but not limited to poisoned training data, problematic pre-trained models, faked gradients in distributed training, etc., can be conducted during the training processes of either the pre-trained model or the fine-tuned model. Backdoor attack is such a threat due to DNNs stealthiness and infectiousness. Attackers generate a backdoored neural network, which can produce normal outputs on normal data, but wrong outputs once encountering a data with a secrete backdoor key. In image classification, for example, a small predefined patch can serve as a backdoor key which can be overlaid on a normal image or even pretend a special physical item in a photo. When such a backdoored neural network is deployed, attackers can easily fool the systems. Very importantly, implementing such attacks does not require changing the training process or the structure of the trained model; instead, only a small portion of the training data needs to be poisoned/perturbed. The threat of the backdoor attack to DNNs is very similar to that of the backdoors of computer software: the system testing on the normal benchmark (or validation dataset for networks) cannot help users to identify the backdoors. By tuning the percentage of the poisoned training data in the whole training dataset, the backdoor in the neural network incurs very marginal or even zero degradation in validation accuracy. It is also impossible for the users to guess the backdoor keys with brutal force as the size of the backdoor key can vary significantly. In addition, the backdoor attack can retain its effectiveness after transfer learning: a fine-tuned network may inherit the backdoors from its initial pre-trained model and can be triggered by the same backdoor key of the pre-trained model. The transferability of backdoors indicates a virus-like behavior: the malicious training data can infect not only the backdoored model but also any models developed based on the backdoored model. Unfortunately, to the best of our knowledge, there is no any existing method that can detect the backdoors of a given model without the prior knowledge of its training process. In this project, we propose to systematically study the backdoor attacks on DNNs. We plan to start with the evaluation and analysis on large models and datasets under various settings. A threat model will be established for measuring how different parameters, such as model complexity, dataset size, modified data ratio and backdoor key size, will affect the effectiveness of the attack, including the success rate of attack, maximum capacity of backdoor injection, and effective range of backdoor infection. The quantitatively measured data will be used to extract important characteristics of the DNNs such as their robustness in physical world and the model s transferability. The objects of our project are: 1) realizing a high precision online detection system by leveraging these representative activation characteristics, and 2) developing a proactive offline detection system by comparing the activations from multiple models. The online detection system can alarm human users whenever a suspected attack appears. The offline system can statically search for possible backdoors in a given model before deployment of the model, and provide the second-stage validation to the online detector. Once completed, the proposed research will obtain a methodology to implement defense against neural backdoor attacks.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 14, 2019
- Source ID
- W911NF1910034
Entities
People
- Hai Helen Li
Organizations
- Army Contracting Command
- Duke University
- United States Army