Software Dependability in the Operational Phase.

Abstract

Software quality should be built-in and maintained throughout the software life cycle, which requires understanding of software dependability in actual environments. This thesis discusses how to develop analysis techniques for evaluating the dependability of operational software using real measurements while taking design issues into account. The issues addressed include fault categorization and characterization of error propagation, symptom-based diagnosis of recurrent software failures, identification of software fault tolerance, evaluation of the impact of software faults on the overall system, and the development of techniques for analyzing multiway failure dependencies among software and hardware modules. The process is illustrated using a case study of the Tandem GUARDIAN operating system. Using process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of reported faults in the system software that cause processor failures. The loose coupling between processors, which results in the backup execution (the processor state and the sequence of events) being different from the original execution, is a major reason for the measured software fault tolerance. About 72% of reported field software failures in Tandem systems are recurrences of previously reported faults. In addition to the conventional approach of reducing the number of faults in software, software dependability in Tandem systems can be enhanced by reducing the recurrence rate and by improving the robustness of process pairs and the system configuration. An approach for automatically diagnosing recurrences based on their symptoms is developed.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 1995
Accession Number
ADA294431

Entities

People

  • In‐Hwan Lee

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Computer Programming
  • Computer Programs
  • Computers
  • Data Transmission
  • Engineering
  • Factor Analysis
  • Failure Mode And Effect Analysis
  • Fault Tolerance
  • Markov Models
  • Measurement
  • Operating Systems
  • Probabilistic Models
  • Probability
  • Software Development
  • System Software
  • Test And Evaluation
  • Time Intervals

Fields of Study

  • Computer science
  • Engineering

Readers

  • Oncology
  • Parallel and Distributed Computing.
  • Software Engineering.