Automated Code Translation to Memory-safe Languages

Abstract

Given the vast amount of C/C++ code in popular applications, porting them to memory-safe languages is a gradual process that involves the manual rewriting of individual and mostly self-contained code components, and interfacing them with the rest of the application#s code. The proposed research will aid and speed up this process by investigating techniques for automating the rewriting of exiting C/C++ code into Rust#the strongest contender for a memory-safe #systems# language with acceptable runtime overhead#based on recent advances in code-specific large language models (LLMs). Key innovative aspects of the proposed work include: i) gradual and incremental translation by focusing on smaller code fragments instead of the whole program, and cross-language integration and optimization to allow the intermixing of C/C++ and Rust code at the binary level; ii) prioritization of which parts of code should be translated first, by calculating their risk of vulnerability and their ease of translatability; and iii) development of domain-specific C-to-Rust code transcribers by fine-tuning existing (and training new) code-specific LLMs.Approved for Public Release.

Document Details

Document Type
DoD Grant Award
Publication Date
Dec 15, 2023
Source ID
N000142412054

Entities

People

  • Michalis Polychronakis

Organizations

  • Office of Naval Research
  • Research Foundation for the State University of New York
  • United States Navy

Tags

Fields of Study

  • Computer science
  • Engineering

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation
  • Computer Programming and Software Development.