Data Analytics for Cyber Security: Defeating the Active Adversaries
Abstract
As more and more data gathered for detecting, and preventing cyber attacks, existence of this ever increasing data created the need to apply data analytics techniques to cyber security. Unique to cyber security applications, data analytics techniques need to deal with active adversaries that try to deceive the data analytics models and avoid being detected. For example, a malware can incorporate large amounts of legitimate code to masquerade as legitimate software and obfuscate its binary to deceive classifiers. The existence of such adversarial behavior motivated the development of robust and resilient classification techniques for various tasks. These robust classification techniques assume the existence of carefully labeled data instances (e.g., labeling each network flow as attack versus normal). However, in practice, labeling the data instances often requires costly and time consuming human expertise and becomes a significant bottleneck. Meanwhile unlabeled instances can be used to further improve the accuracy of learning techniques. Novel robust techniques that can leverage large amount of unlabeled data combined with small number of labeled instances need to be developed for cyber security applications. Another important challenge in using data analytics techniques in practice is to get actionable intelligence from the learned models. In many cases, learned models could be complex and they may not be easily understandable by the practitioners. Lack of clear understanding on why certain decisions are made could reduce the trust in data analytics models. Therefore, creating actionable intelligence from the learned robust models is an urgent need for enabling widespread practical deployment. To address the above mentioned challenges, we propose to develop adversarial attack resistant techniques that can leverage large amount of unlabeled data. Furthermore, we propose to use such techniques to create easy to understand actionable rules. To achieve these goals, we plan to focus on multiple yet complimentary research directions. Firstly, by utilizing game theoretic ideas, we propose to develop adversarial unsupervised learning techniques that can be used for building robust anomaly detection models. Secondly, we plan to develop novel adversarial active learning techniques that can effectively use unlabeled data to improve classifier accuracy against adversaries. Our proposed adversarial active learning techniques can select the right data instances for labeling by experts among large amount of unlabeled data while being resistant to attack instances in the unlabeled data and potential malicious errors by the experts. We also plan to combine the expert opinions in a novel way using adversarial data clustering. Furthermore, we will develop techniques that will map these learned models to more actionable and easy-to-deploy rules. Finally, we propose to apply and evaluate our novel adversarial unsupervised and active learning techniques to two important cyber security applications: malware and DNS Blacklists. To our knowledge, the proposed project would be the first comprehensive effort to leverage unlabeled cyber security incident data for building better adversarial attack resistant actionable models. The proposed project will enable the deployment of efficient, actionable and resilient data analytics models for detecting and predicting cyber attacks. This in return will reduce the potential losses due to unexpected cyber attacks against Army assets.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Oct 16, 2018
- Source ID
- W911NF1710356
Entities
People
- Murat Kantarcıoğlu
Organizations
- Army Contracting Command
- United States Army
- University of Texas at Dallas