Large Language Models for Software Reverse Engineering

Abstract

The role of Software Reverse Engineering (SRE) has skyrocketed over the past decade due to increases in the complexity of software written by bad actors in the cyberspace domain. One key element of SRE is that of code explanation. However, despite the wide range of tools available for SRE, the task still remains a time-intensive and complex endeavor. Software binaries are often stripped and obfuscated, removing key information necessary for binary analysis. Additionally, these binaries come in a wide variety of instruction set architectures, requiring reverse engineers to understand low-level assembly code for multiple architectures. Adding to the complexity of the problem is the fact that SRE is multidisciplinary and requires knowledge not only in low-level programming but also networking, full stack development, mathematics, and more. Due to the extremely specialized combination of skill-sets necessary to reverse engineer software, we propose the use of finetuned Large Language models in conjunction with a software analysis package CFG2VEC in order to generate step-by-step explanations of stripped binary code.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 25, 2024
Accession Number
AD1225984

Entities

People

  • Miguel Garcia

Tags

Fields of Study

  • Computer science
  • Engineering

Readers

  • Computer Programming and Software Development.
  • Database Systems and Applications
  • Economics

Technology Areas

  • Cyber