Unsupervised Learning of Library Routines to Predict Function

Abstract

Since malware is a constantly evolving threat, it requires significant expertise to detect, identify, and mitigate. We postulate that deep learning can be adapted to this problem domain to provide automated analysis of arbitrary binary code to aid cyber analysts in the identification of functional components. As a proof-of-concept, we trained a convolutional auto encoder to reproduce various fields of the disassembled binaries of standard Linux libraries. We then performed clustering on the bottleneck layer to identify possible clusters of similarity among the various routines. Our spot check of 100 routines suggests that deep learning may indeed be useful for routine classification. However, further network-topology refinement and a concerted ground-truth labelling effort will be required to yield a production-level analytical tool.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 25, 2019
Accession Number
AD1081619

Entities

People

  • Anne Logie
  • Michael S. Lee

Organizations

  • United States Army Research Laboratory

Tags

Communities of Interest

  • Autonomy
  • Cyber
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Computer Languages
  • Computer Programming
  • Computer Science
  • Computers
  • Convolutional Neural Networks
  • Deep Learning
  • Dimensionality Reduction
  • Information Science
  • Linguistics
  • Machine Learning
  • Neural Networks
  • Operating Systems
  • Standards
  • Two Dimensional
  • Unsupervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Database Systems and Applications
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks
  • Cyber