Experimental Study of Software Dependability

Abstract

In this study, a distributed fault injection and monitoring environment (DEFINE) has been developed. It consists of a target system, a fault injector, a software monitor, a workload generator, a controller, and several analysis utilities. DEFINE can inject software faults as well as hardware faults, can trace fault propagation in software systems and among machines, can monitor whether faults are activated and when the faults are activated, and has accurate time control. The fault models used are extracted from the results of field error data analyses and fault simulations. Fault injection experiments show that the majority of no-impact faults are latent. Memory faults and software faults usually have a very long latency, while bus faults and CPU faults tend to crash the system immediately. About half of the detected errors are data faults, and they are detected while the system is trying to access a memory location it has no privilege to access. Only about 8 of faults propagate to other UNIX subsystems. Fault propagation from servers to clients occurs more frequently than from clients to servers. The fault impact depends on the workload. Transient Markov reward analysis shows that the performance losses incurred by bus faults and CPU faults are much higher than those incurred by software and memory faults. Among software faults, the impact of pointer faults is higher than that of non-pointer faults.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 1994
Accession Number
ADA284098

Entities

People

  • Wei-lun Kao

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Application Software
  • Computer Programming
  • Computer Programs
  • Computer Science
  • Computers
  • Data Analysis
  • Debugging
  • Electrical Engineering
  • Engineering
  • Fault Tolerant Computing
  • High Performance Computing
  • Instruction Set Architecture
  • Operating Systems
  • Reliability
  • Software Development
  • Software Testing
  • Test And Evaluation

Fields of Study

  • Computer science
  • Engineering

Readers

  • Fault Tolerant Diagnosis of Black and White Balloon Isolation Tests Using ¥.
  • Parallel and Distributed Computing.
  • Software Engineering.