Real-Time Fault Tolerant Computer Systems
Abstract
This research project is developing a new approach to software fault tolerance called analytic redundancy. It has been widely acknowledged that software failures are the most important issue in large computer system reliability. The failure rate of software has risen to 9.4 times the failure rate of hardware, and software failures have captured substantial media attention in recent years. The problem of software reliability is especially difficult because the major recognized approaches (recovery blocks and n-version programming) are known to have serious drawbacks. Analytic redundance is an approach which uses simplicity (in the form of software which is relatively simple, well understood and well tested but whose performance is merely adequate) to control complexity (in the form of software which has high performance but is complex and, therefore, not reliable). Rather than trying to combine the two algorithms (which would create an even less reliable system), we allow the complex software to control the system as long as it is behaving properly (as judged by the simple software). If it is determined that the complex software is behaving incorrectly, then the simple software takes over. The simple software will guarantee a baseline, if not optimal, performance.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 30, 1992
- Accession Number
- ADA258891
Entities
People
- John Lehoczky
- Lui R. Sha
- Marc Bodson
- Ragunathan Rajkumar
Organizations
- Carnegie Mellon University