Interpretable Convolutional Neural Networks and Adversarial Examples: FY19 Line-Supported Program
Abstract
Convolutional neural networks (CNNs) can achieve remarkable accuracy on many computer-vision and image-recognition problems, but their prediction mechanism is difficult or impossible to understand and normal training produces models that are highly susceptible to adversarial examples - inputs designed by an attacker to be misclassifixC;ed despite being visually indistinguishable from ordinary images of the correct class. These shortcomings have sparked great interest in explaining CNNs and in training CNNs that are resistant to adversarial inputs. In this report, we show that one can apply robust optimization techniques to an interpretable CNN and train models that are both interpretable and robust.
Document Details
- Document Type
- Technical Report
- Publication Date
- Dec 18, 2020
- Accession Number
- AD1147899
Entities
People
- J. K. Su
- N. Kaushik
Organizations
- MIT Lincoln Laboratory