Automated Code Translation to Memory-safe Languages
Abstract
Given the vast amount of C/C++ code in popular applications, porting them to memory-safe languages is a gradual process that involves the manual rewriting of individual and mostly self-contained code components, and interfacing them with the rest of the application#s code. The proposed research will aid and speed up this process by investigating techniques for automating the rewriting of exiting C/C++ code into Rust#the strongest contender for a memory-safe #systems# language with acceptable runtime overhead#based on recent advances in code-specific large language models (LLMs). Key innovative aspects of the proposed work include: i) gradual and incremental translation by focusing on smaller code fragments instead of the whole program, and cross-language integration and optimization to allow the intermixing of C/C++ and Rust code at the binary level; ii) prioritization of which parts of code should be translated first, by calculating their risk of vulnerability and their ease of translatability; and iii) development of domain-specific C-to-Rust code transcribers by fine-tuning existing (and training new) code-specific LLMs.Approved for Public Release.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Dec 15, 2023
- Source ID
- N000142412054
Entities
People
- Michalis Polychronakis
Organizations
- Office of Naval Research
- Research Foundation for the State University of New York
- United States Navy