Effectiveness Evaluation of Fault-Tolerant Multiprocessor Systems.
Abstract
An important area of research is in the analysis of the coverage of a fault tolerant system, that is, the probability that the system can recover from a fault. The author has studied a variety of models, from simple phase-type models to very complex stochastic Petri net models, and has investigated solution techniques for each model type. His methodology allows consideration of external events that can interfere with recovery, such as a hard limit on recovery time, or the occurrence of a second near-coincident fault. It was discovered that a policy of attempting transient recovery upon detection of an error (as opposed to automatically reconfiguring the affected component out of the system) may actually increase the unreliability of the system. This result holds if the error detectability is not nearly perfect, so that the risk of producing an undetectable error (if the transient error is present) is greater than the benefit gained by not discarding the component. Keywords: Bibliographies.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 27, 1988
- Accession Number
- ADA191688
Entities
People
- Kishor S. Trivedi
Organizations
- Duke University