COBRA-GCN: Contrastive Learning to Optimize Binary Representation Analysis with Graph Convolutional Networks

Abstract

The ability to quickly identify whether two binaries are similar is critical for many security applications, with use cases ranging from triaging millions of novel malware samples, to identifying whether a binary contains a known exploitable bug. There have been many program analysis approaches to solving this problem, however, most machine learning approaches in the last 5 years have focused on function similarity, and there have been no techniques released that are able to perform robust many to many comparisons of full programs. In this paper, we present the fixC;rst machine learning approach capable of learning a robust representation of programs based on their similarity, using a combination of supervised natural language processing and graph learning. We name our prototype COBRA: Contrastive Learning to Optimize Binary Representation Analysis. We evaluate our model on several different metrics for program similarity, such as compiler optimizations, code obfuscations, and different pieces of semantically similar source code. Our approach outperforms current techniques for full binary diffng, achieving an F1 score and AUC .6 and .12, respectively, higher than BinDiff while also having the ability to perform many-to-many comparisons.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 24, 2022
Accession Number
AD1208823

Entities

Organizations

  • MIT Lincoln Laboratory

Tags

Communities of Interest

  • Cyber
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence Software
  • Assembly
  • Computational Science
  • Computer Languages
  • Computer Programming
  • Computer Programs
  • Convolutional Neural Networks
  • Debugging
  • Dimensionality Reduction
  • Language
  • Machine Learning
  • Natural Language Processing
  • Natural Languages
  • Neural Networks
  • Recurrent Neural Networks
  • Training

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Cybersecurity.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks
  • Cyber