Load Latency Tolerance in Dynamically Scheduled Processors

Abstract

This paper provides quantitative measurements of load latency tolerance in a dynamically scheduled processor. To determine the latency tolerance of each memory load operation, our simulations use flexible load completion policies instead of a fixed memory hierarchy that dictates the latency. Although our policies delay load completion as long as possible, they produce performance (instructions committed per cycle (IPC)) comparable to an ideal memory system where all loads complete in one cycle. Our measurements reveal that to produce IPC values within 8% of the ideal memory system, between 1% and 62% of loads need to be satisfied within a single cycle and that up to 84% can be satisfied in as many as 32 cycles, depending on the benchmark and processor configuration. Load latency tolerance is largely determined by whether an unpredictable branch is in the load s data dependence graph and the depth of the dependence graph. Our results also show that up to 36% of all loads miss in the level one cache yet have latency demands lower than second level cache access times. We also show that up to 37% of loads hit in the level one cache even though they possess enough latency tolerance to be satisfied by lower levels of the memory hierarchy.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2005
Accession Number
ADA440304

Entities

People

  • Alvin R. Lebeck
  • Srikanth T. Srinivasan

Organizations

  • Duke University

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Access Time
  • Computer Science
  • Contrast
  • Cost Models
  • Costs
  • Degradation
  • Education
  • Hierarchies
  • Information Operations
  • Instructions
  • Measurement
  • North Carolina
  • Observation
  • Pipelines
  • Sampling
  • Simulations
  • Simulators

Fields of Study

  • Computer science

Readers

  • Parallel and Distributed Computing.