An Overview of Reliable Computer System Design,
Abstract
This paper was produced to support a series of lectures on reliable computer system design on multiple processor computers. The paper presents an overview of reliable computer system design. It attempts to provide a pragmatic guide to redundancy and recovery, but does not give a very thorough discussion of either the theory or philosophy of reliable systems. The paper introduces and defines the basic concepts of reliability, and describes the basic mechanisms for achieving fault tolerance. It compares the attributes of multi processor and multi computer systems from the point of view of reliability. It describes in some detail techniques for achieving tolerance to both hardware and software faults. The paper concludes by outlining some of the major unsolved problems of reliable system design.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 01, 1980
- Accession Number
- ADA089271
Entities
People
- J. A. Mcdermid
Organizations
- Royal Signals and Radar Establishment