Hard CPU Related Failures and System Activity: Measurement and Modelling.

Abstract

This paper describes the measurement and analysis of hard CPU and memory errors, and system activity at the Stanford Linear Accelerator Center computational facility. Nearly 25 percent of the errors were estimated to be permanent. The occurrence of a failure was found to be strongly correlated with the level and type of workload prior to the occurrence of the failure. For example, it is shown that the risk of a permanent error increases in a non-linear fashion with the amount of interactive processing. The observed tendency is present in three years of load data. This observation is significant because a load-failure relationship found at the CPU level must, in our view, be considered fundamental. In addition, the fact that most of the errors are permanent, provides new information on these error types viz. their load dependent behavior. Our analysis procedure, used on the SLAC data, has been validated on an artificially created data base seeded with failures. (Author)

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 1983
Accession Number
ADA130821

Entities

People

  • David J. Rossetti
  • Ravishankar K. Iyer

Organizations

  • Stanford University

Tags

Communities of Interest

  • Advanced Electronics
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Computer Science
  • Data Analysis
  • Databases
  • Determinants (Mathematics)
  • Fault Tolerant Computing
  • Information Science
  • Linear Accelerators
  • Measurement
  • Navigational Equipment
  • Observation
  • Operating Systems
  • Probability
  • Random Variables
  • Reliability
  • Semiconductor Devices
  • Statistical Analysis

Readers

  • Brain and Cognitive Science; Experimental Psychology; Cognitive Neuroscience
  • Computational Modeling and Simulation
  • Mechanical Engineering/Mechanics of Materials.