Automatic Detection of Error Patterns in Computer Systems.
Abstract
The objective of the thesis is to develop a methodology for automatically detecting frequently occuring error patterns in large computer system. The technique is shown to work on several non-obvious cases of real failure data from two CYBER systems at the University of Illinois. The errors can be anywhere in the machine, (e.g., the CPU, channel, and memory). The proposed technique will capture the error patterns and is particularly valuable in capturing the symptoms of severe faults, including those which propagate across coupled machines. These error patterns are validated by their recurrence. The method is general and can be applied to any system with automatic error detection and logging. Keywords: fault tolerant computing; validation algorithms.
Document Details
- Document Type
- Technical Report
- Publication Date
- Dec 01, 1986
- Accession Number
- ADA176312
Entities
People
- Douglas E. Sanders
Organizations
- University of Illinois Urbana–Champaign