LAVA: Large scale Automated Vulnerability Addition

Abstract

Work on automating vulnerability discovery has long been hampered by a shortage of ground-truth corpora with which to evaluate tools and techniques. To begin to address this, we present LAVA, a system for automatically and quickly injecting large numbers of realistic bugs into program source code. LAVA employs a pair of taint-based measures to identify program quantities that both depend upon specific input bytes in a simple way yet do not overly influence control flow. These DUAs (dead-uncomplicated and available data) are employed, via source-to-source transformation, to perturb program quantities at later program points that are likely to cause vulnerabilities. Every LAVA vulnerability is accompanied by a input that triggers it, whereas normal inputs are extremely unlikely to do so. Further, every injected bug is validated, and thus every working bug comes with both a proof-of-concept input and a known manifestation point. These vulnerabilities are synthetic but, we argue, still realistic, in the sense that they are embedded deep within programs and are triggered by real inputs. In order for an automated tool to discover them, it would have to be able to reason correctly and precisely about all the code executed up to the DUA. Using LAVA, we have injected thousands of bugs into popular programs such as file, readelf, bash, and tshark. We believe LAVA can form the basis of an approach for generating extremely high quality ground truth vulnerability corpora on demand.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 23, 2016
Accession Number
AD1034497

Entities

People

  • Brendan Dolan-gavitt
  • Frederick Ulrich
  • Ryan Whelan
  • Timothy R. Leek

Organizations

  • MIT Lincoln Laboratory

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Computations
  • Computer Programs
  • Computers
  • Control Systems
  • Detectors
  • False Alarms
  • Histograms
  • Hypervisors
  • Instructions
  • Language
  • Open Source Software
  • Operating Systems
  • Shell Scripts
  • Test And Evaluation
  • United States Government
  • Virtual Machines
  • Warning Systems

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Educational Psychology
  • Geotechnical Engineering.