# Techniques for Mitigating Memory Latency Effects in the PA-8500 Processor



David Johnson Systems Technology Division Hewlett-Packard Company

# **Presentation Overview**

- PA-8500 Overview
- Instruction Fetch Capabilities
- Reorder Buffers ("The Queue")
- Data Cache
- System Bus







### **PA-8500 Processor Core**





## **Memory Latency**



Latency Problems Instruction Fetches & Loads

<u>Techniques for Hiding Latency</u> High hit-rate caches Prefetching Overlapping cache misses



# **Instruction Fetch Features**

- Instruction Cache
  - ◆ 0.5 MB on-chip cache
  - ♦ 4-way set associative
  - Pipelined 2-cycle access
  - Provides 4 instructions per cycle to CPU core
  - Supports 32-byte and 64-byte line sizes
- Instruction Prefetching



## **PA-8500 I-Cache Composition**



## **PA-8500 Instruction Prefetching**





### **Reorder Buffers**



Cycle by cycle progression of a load instruction

| Insert | Launch | Address | Cache | Cache | RR | Retire |
|--------|--------|---------|-------|-------|----|--------|
|--------|--------|---------|-------|-------|----|--------|



# **LOAD-MISS Overlapping**





PA-8500 Solution



#### **Address Reorder Buffer: High-Speed Custom Circuitry**

### **Data Prefetching**





## **Data Cache Features**

- 1.0 MB on-chip cache
- 4-way set associative
- 2-cycle pipelined access
- Two accesses per cycle
- Supports 32-byte and 64-byte line sizes
- Sophisticated Store Queue



## **Data Cache**



# Single-Level vs. Multi-Level Cache Designs

1.5 MB @ 2 cycles





# **System Bus Interface**

- Split-transaction bus with out-of-order returns
- Multiple transactions in flight simultaneously
- Priority given to latency-sensitive transactions
- Asynchronous Interface
- Turbo Mode



### **Turbo Mode**



#### High-Speed Data Transfer between Memory and CPU



### **Mitigating Memory Latency Effects**

- Large Caches
- Out-of-Order Queue
- Flexible System Interface
- Custom Circuit Design

The PA-8500 Achieves Superb Performance !

