Intel 2013 Anti-Malware Research Project

Abstract

As of 2011, approximately 50 million unique malicious software samples are being generated and released into the wild every month. Today?s detection methods for malicious software cannot keep pace. The sheer amount of data contained in those malware samples has transformed malicious software detection into a data mining problem. Industry experts have put forth the hypothesis that 50 million samples of malicious software per month cannot be produced without code reuse. Indeed, initial research into similarity analysis has shown this to be the case. Thus far, significant research into detecting similarity in malware has been successful, but, this research has been mostly limited to variant detection. The value of variant detection is limited, as it relies on malicious code packages to be highly similar in their entirety. While this general similarity technique catches variants, it fails to catch code reuse in new software where the code reuse is limited to a small part of a new program. Code reuse that is highly similar locally, versus generally, is one area Intel proposes to investigate. We intend to explore methods of similarity clustering for a large population. This type of research can provide new insights into targeting and command and control trends in malicious software production, dramatically reduce the amount of new code that needs to be examined, and provide relational information that could provide tools for attribution. Building on data gathered from initial malware provenance research exploring this topic, we plan to explore small scale clustering algorithms, imported from the arena of bioinformatics research, to identify malicious code. Intel will begin with small scale clustering to explore different graphing techniques based on a variety of similarity data and evaluate their effectiveness to uncover code reuse in the samples. Consistent with the aims of the malware provenance research, potential research vectors include building mechanisms to capture the very latest malware attacks and exploits on the Internet to use as samples in refining the similarity analysis techniques. Doing this could provide very interesting insights into the genealogy of these fresh attacks as the similarity analysis links them back via code-reuse to older known malware datasets. One example of a method to capture such new attacks would be to deploy systems, based on Intel?s latest anti-malware technologies in appropriate locations in networks that experience malware attacks. An approach such as this could borrow techniques for capture, analysis and defensive policy from a similar existing effort within Intel Labs. In addition, this could allow us to investigate effective methods for applying malware provenance research results to real-world malware on the Internet.

Document Details

Document Type
DoD Grant Award
Publication Date
Feb 23, 2016
Source ID
FA70001320005

Entities

People

  • Vittal Kini

Organizations

  • United States Air Force Academy

Tags

Fields of Study

  • Computer science

Readers

  • Cybersecurity.
  • Software Engineering.
  • Theoretical Analysis.

Technology Areas

  • AI & ML
  • Cyber
  • Fully Networked C3
  • Fully Networked C3 - Command and Control