Separation of Benign and Malicious Network Events for Accurate Malware Family Classification

Abstract

Labeling malware samples with their appropriate malware family helps understand and track malware evolution and develop mitigation techniques. Current malware analysis techniques that use supervised machine learning rely on classification models that are trained on malware traffic generated from a sandbox environment. These models are then used to classify future unseen observations. In practice, however, malware traffic comes mixed with other legitimate background traffic from host machines, such as user browsing and software update traffic. Hence, the classifiers accuracy to predict the correct malware label on unseen (mixed) traffic is low. We propose a novel classification system that uses an Independent Component Analysis (ICA) module that applies distribution decomposition to separate the observed traffic into two components, malware traffic and background traffic. We also use a random forest classifier module to learn a classification model for every malware family, and then use it to predict malware family labels using the output of the ICA module. This system is thus capable of labeling malware traffic after removing background artifacts (noise), which makes it more efficient and accurate than current classification methods. Our experiments on three malware family datasets show that the performance of our system improves significantly after removing the background traffic artifacts.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 28, 2015
Accession Number
AD1017089

Entities

People

  • Aziz Mohaisen
  • Hesham Mekky
  • Zhi-Li Zhang

Organizations

  • Cornell University

Tags

Communities of Interest

  • Cyber

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Artifacts
  • Classification
  • Detection
  • Electrical Engineering
  • Elimination
  • Environment
  • Feature Extraction
  • Frequency
  • Ground Traffic
  • Identification
  • Information Science
  • Learning
  • Machine Learning
  • Probability
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Aviation Safety and Air Traffic Management
  • Computer Vision.
  • Cybersecurity.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks
  • Cyber
  • Cyber - Cryptography