Slimming, Simplifying and Securing Software Systems

Abstract

Software systems have seen huge growth in sophistication, size and complexity at the cost of severe performanceoverheads, bugs and" vulnerabilities. Although only a small fraction of the functionality of large applications or serversare used by any given end use"r, unused features are difficult to eliminate because software is configured, compiledand linked by developers before shipping. Sta"tic analysis and optimization techniques after shipping are too weak toautomatically eliminate unused functionality. State-of-the-a"rt code generation and code size minimization techniquesonly yield small improvements in code size, typically a few percent at best"".We propose a new, yet pragmatic, approach to distributing and linking software that is far more flexible thanapproaches used toda""y, allowing much greater degrees of user-specific customization and comprehensive compileroptimizations across application-library"" and application-OS boundaries. This approach builds on a previous projectcalled ALLVM, in which essentially all software on a syst""em, including applications, static and dynamic libraries,servers, and optionally operating systems are made available in the form o""f a rich compiler internal representation (IR),the LLVM IR (called bitcode). Where necessary, binary code is lifted to LLVM IR usin""g an existing ALLVM tool, whichwe will further refine to enhance the quality of the generated IR.We propose that software should b""e organized and shipped as a ~Bitcode Database~ (BCDB) containing small codefragments (e.g., individual functions) in IR form, inst"ead of pre-linked applications and libraries. The database isindexed using syntactic or semantic search methods to identify equival"ent code fragments; a key research goal is todevelop suitable search strategies, and policies for retaining or eliminating equivale"nt versions. A Link DependenceGraph (LDG) describes how applications or dynamic libraries can be constructed from these code fragme"nts. TheBCDB and LDG, together with the APIs actually used by a client application, can enable more precise analysis ofunused feat""ures, by taking advantage of fine-grain code fragments and link dependences between them.We will exploit compile-time and run-time" configuration files (plus rich compiler IR for all layers of software) to furtherspecialize and debloat application code on end-us"er systems. We are developing specific compiler transforms toenable more aggressive constant propagation, specialized context-sensi""tive algorithms and semantics-awaretransformations for library interfaces, all of which can yield far better code specialization th"an current techniques.Superoptimization techniques for LLVM IR can achieve significant code size reductions over existing compiler"optimizations. This system, Souper, uses a code cache of highly optimized LLVM fragments; by integrating this cachewith the BCDB, S""ouper will be able to improve the quality of code in the database and to detect functionally-equivalententries. Moreover, we will i""mprove Souper~s power to look across function boundaries, across memory accesses, andacross backwards edges in loops, none of which"" are supported today. We will use Souper to greatly improve thequality of the LLVM IR generated from binary code.Finally, we will" develop and evaluate security hardening techniques that take advantage of reductions in software sizeand complexity. We will perfo"rm more aggressive static analysis of binary code by lifting it to LLVM IR, improvemethods of measuring the security improvements t""hat can be achieved throughdebloating, on-site customization, and security hardening, and investigate methods to dynamically adjust" the hardeningapproaches used in different parts of a program to obtain the best security/performance tradeoffs. Many of thesehardening methods and security evaluation methods can be used with debloating techniques developed by otherteams in this program.

Document Details

Document Type
DoD Grant Award
Publication Date
Nov 03, 2017
Source ID
N000141712996

Entities

People

  • Vikram Adve

Organizations

  • Office of Naval Research
  • United States Navy
  • University of Illinois Urbana–Champaign

Tags

Fields of Study

  • Computer science

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Computer Programming and Software Development.
  • Data Mining and Knowledge Discovery.