Learn-to-Reason: A Probabilistic Binary Analysis Infrastructure and Its Application in Binary Debloating

Abstract

Motivation. Computing systems in Naval environments heavily rely on COTS software and legacy software, without theavailability of s""ource code. Many such systems are overly complex, including functionalities that are non-relevant to00 06/30/17Naval operations. T"heir code base is often out-dated and contains various software vulnerabilities. Efforts to removeunused features for lowering comp"lexity, reducing attack surface, and improving efficiency; as well as efforts to hardenthe software, have to be built on a precise,"" robust, and scalable static binary analysis and transformation infrastructure.The key challenge of binary analysis is to recover h"igh level semantic information. Reverse engineering is by natureimprecise and uncertain. Traditional analyses have to make conservative assumptions when facing uncertainty. Theaccumulated imprecision quickly degrades their usability and causes various precision" and scalability issues.Interestingly, statistical learning techniques are often able to produce useful results when the low level"" analysescannot. This is mainly because statistical learning techniques are good at collecting/integrating various hints, andmakin""g predictions based on past patterns and experiences. On the down side, statistical learning can hardly takeprogram semantics into"" consideration and hence cannot ~connect the dots~.Proposed Research. Recently, a new paradigm ~ called ~Learn-2-Reason~ ~ has been" proposed to bridge the gapsbetween statistical learning and formal reasoning for more effective system modeling and analysis. It i"s not a simplecombination of the two, as they will influence, interact and hence improve each other by sharing their inputs andexc"hanging their knowledge. We propose to instantiate the paradigm in binary analysis as the two align perfectly.Statisticallearning has unique advantages in dealing with the inherent uncertainty in binary analysis while formal reasoningallows connecting the dots" such that the predictions from the learning models can be propagated, aggregated, andcross-validated. The fusion and inter-play be"tween learning and reasoning can be achieved by probabilistic inference.Both learning results and formal reasoning rules are encode"d as probabilistic constraints (or, conditional probabilities)in a Probabilistic Graphical Model (PGM). PGM inference produces a pr""obability distribution that indicates the mostlikely results and maximizes the satisfaction of constraints. Specifically, we aim to"" develop a comprehensiveprobabilistic binary analysis infrastructure that supports probabilistic disassembling,variable identifica""tion and type inference, points-to analysis, CFG/PDG construction, and probabilistic binary rewriting.We will also apply the infras""tructure to perform UI driven binary reduction. A key enabling technique is an effective andefficient PGM inference engine, which i""s largely lacking. Hence, we aim to devise such an engine specialized forprogram analysis.Innovative Claims and Impacts. The intel"lectual merit of our proposed research lies in the following. (1) It willdemonstrate the power of ~Learn-2-Reason~ by achieving bre"akthroughs in binary program analysis. While thetraditional analysis is reaching its ceiling, the innovative coupling with learning" will enable the next generation staticanalysis. (2) The proposed binary analysis primitives will deliver unprecedented precision and robustness. (3) A highlyscalable and precise PGM inference engine will be developed through this project. Broader impacts of thi"s researchinclude the following. (1) The models, theories, and software artifacts from this research will enable numerous projects"that rely on binary analysis. (2) This research will provide a solid foundation and valuable experience for more(advanced) instantiations of the Learn-2-Reason vision. (3) This project will train future researchers that possess theunique expertise on the synergetic integration of learning and formal reasoni

Document Details

Document Type
DoD Grant Award
Publication Date
Sep 29, 2017
Source ID
N000141712947

Entities

People

  • Xiangyu Zhang

Organizations

  • Office of Naval Research
  • United States Navy
  • University of Virginia

Tags

Fields of Study

  • Computer science

Readers

  • Computer Programming and Software Development.
  • Neural Network Machine Learning.
  • Systems Analysis and Design

Technology Areas

  • AI & ML