Learning and Repair Techniques for Self-Healing Systems
Abstract
Techniques were investigated for enabling systems to recover from data structure corruption errors. These errors could occur, for example, because of software errors or because of attacks. The technique developed contains three components. The first component observes the execution of training runs of the program to learn key data structure consistency constraints. The second component takes the learned data structure consistency constraints and examines production runs to detect violations of the constraints. The third component updates the corrupted data structures to eliminate the violations. The goal is not necessarily to restore the data structures to the state in which a (hypothetical) correct program would have left it, although in some cases our system may do this. The goal is instead to deliver repaired data structures that satisfy the basic consistency assumptions of the program, enabling the program to continue to operate successfully. A prototype system that contained these three components and the system was applied to two programs, BIND (part of the Internet Domain Name System (DNS)) and FreeCiv (a freely distributed multi-player game). Experience with BIND indicated that the technique can eliminate previously existing undesirable behavior in this program. The technique was applied to FreeCiv in the context of a Red Team experiment. The results of this experiment show that, on this program and for the workload in the Red Team experiment, our system significantly out-performed the DARPA Self-Regenerative System metrics: it recognized 80% (not just 10%) of the attacks, and it recovered from 60% (not just 5%) of them.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 01, 2006
- Accession Number
- ADA451095
Entities
People
- Martin Rinard
- Michael Ernst
Organizations
- Massachusetts Institute of Technology