An Efficient Technique for Tracking Nondeterministic Execution and its Applications.

Abstract

This report describes a technique for using instruction counters to track non determinism in the execution of operating system kernels and user programs. The operating system records the number of instructions between consecutive nondeterministic events and information about their nature during normal operation. During an analysis phase, the execution is repeated under the control of a monitor, and the nondeterministic events are applied at the same instructions as during the monitored execution. We describe the application of this technique to four areas: Performance monitoring: The technique can be used to instrument an operating system to capture long traces of memory references. Unlike current techniques, it performs the gathering in a postmortem phase and therefore has negligible effect on the computation itself during the monitoring phase. We expect trace periods that are longer than what existing techniques can capture by orders of magnitude with little or no noticeable perturbation to the monitored system itself. Kernel Debugging: This technique can be used to repeat the execution of an operating system that precedes a crash due to a Heizenbug. This allows developers a systematic approach for getting rid of these bugs during testing. Support for Rollback-Recovery: Systems that use checkpointing and execution replay can adopt this technique to ensure that execution replay during recovery is identical to the one before failure, despite the occurrence of nondeterministic events that cannot be captured efficiently otherwise. Software-based TMR systems: Using this technique, a TMR system based on active replication can be built out of off-the-shelf workstations connected by a general purpose network.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 1995
Accession Number
ADA296392

Entities

People

  • E. N. Elnozahy

Organizations

  • Carnegie Mellon University

Tags

DTIC Thesaurus Topics

  • Computations
  • Debugging
  • Instructions
  • Mathematical Analysis
  • Mathematics
  • Monitoring
  • Operating Systems
  • Perturbations
  • Recovery

Fields of Study

  • Computer science
  • Engineering

Readers

  • Parallel and Distributed Computing.
  • Systems Analysis and Design