

## David Roberts<sup>1</sup>, Amin Farmahini-Farahani<sup>1</sup>, Kevin Cheng<sup>1</sup>, Nathan Hu<sup>1</sup>, David Mayhew<sup>12</sup>, Michael Ignatowski<sup>1</sup>

## Introduction – Problems Faced in Memory Systems

Memory subsystem impacts performance, energy, and cost



**Unpredictable Latency** Asymmetric Bandwidth

Other challenges: Interface compatibility between different memory types, workload adaptation to different memory types, and reliability due to process technology scaling, multi-level cells, and capacity increase

## **New Solutions**

**Processing in Memory (PIM):** Processing data near where it resides



**PIM** reduces energy and bandwidth requirements by reducing communication and moving compute to the data.

Recent developments in 3D die stacking such as High Bandwidth Memory and the Hybrid Memory Cube are key enablers for PIM.

Standards for Heterogeneous Computation such as the Heterogeneous System Architecture (HSA) allow Host and PIM processors to share memory and work

**Abstracted memory interfaces** are becoming more important due to the emergence of diverse non-volatile memory technologies

Heterogeneous memory technologies can be used together to reduce cost while providing performance, capacity, and non-volatility

### **A New Memory Interface To Enable Innovation\*\***

- Existing memory interface protocols present a barrier to overcoming key problems and providing scalable, compatible 'smart' memory components from multiple vendors, from cellphones to supercomputers
- This work is a step in overcoming these key problems

\* D. Zhang, et al., "TOP-PIM: throughput-oriented programmable processing in memory," HPDC, 2014. \*\* D. Resnick and M. Ignatowski, "Proposing an abstracted interface and protocol for computer systems," White Paper, Sandia National Lab., 2014.

AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc.

# **NMI: A New Memory Interface to Enable Innovation**

#### <sup>1</sup>AMD Research, Advanced Micro Devices, Inc.

#### **Novel Features**

- Point-to-point networks of NMI Nodes (any combination of ports, switch, memories and processing elements). Master nodes for local control and subnets for massive scalability
- Abstracted, flexible timing interface supporting diverse technologies
- HSA-compatible virtual memory, cache coherency, task dispatch
- Optional feature set profiles to scale cost, area and complexity from embedded to supercomputer systems
- Scalable ECC tunable for application, including memory RAID
- Multiple physical to device mappings for custom interleaving
- Abstracted power modes for global management with mixed devices

#### **NMI Protocol Layers**

(1) Physical Layer: Flexible (electrical/optical/other), under development

(2) Link Layer: Packetized, reliable, scalable header overheads, virtual channels for deadlock avoidance, and low latency

#### (3) Transaction Layer: Classes of optional functionality as follows;

| #     | Class                             | Descriptio   |
|-------|-----------------------------------|--------------|
| 0     | Foundation &                      | Non-cohere   |
|       | Computation                       | memory reg   |
| 1     | Atomics                           | Atomic ope   |
| 2     | Virtual Memory                    | Address tra  |
| 3     | Coherence                         | Cache cohe   |
| 4     | <b>Fixed-Function Units</b>       | Gather/Sca   |
| 5     | Advanced ECC                      | Memory RA    |
| 6     | Persistent Memory                 | Fence, flus  |
| 7-12  | <reserved></reserved>             | Reserved for |
| 13-15 | <vendor-defined></vendor-defined> | Vendor-def   |
|       |                                   |              |

#### **NMI Power Management**

### High-level (abstract) power mode and low-level (direct) power mode



NMI Commands to the element

<sup>2</sup>University of San Diego

ent R/W, capability query, logical gion management, task dispatch erations

anslation, TLB invalidation

nerence

atter, reduction, initialization, etc.

AID support

h, freeing address ranges

for future standard features

fined

# **Unmanaged vs Managed NMI Networks**





**Managed example:** Scalable to many nodes, divided to subnets, each managed by a Master node

## Heterogeneous System **Architecture (HSA)**

- Designed with HSA in mind
- Capability & configuration register trees available on each node
- Support for cache-coherent shared virtual memory
- Support for task dispatch via Architected Queueing Language

# **Logical Memory Regions**

Logical division of physical memory address space into non-overlapping, contiguous regions

Each memory region has its own:

- Physical-to-device address mapping
- Size and address range
- Multi-level cell configuration

# Conclusions

NMI is an abstracted, unified memory interface to support future scale-out memory capacity, processing-in-memory, I/O devices, emerging nonvolatile memories, cache-coherent shared virtual memory

**Unmanaged example:** Small-scale, low-latency, low-overhead

#### NMI Configuration/Capability Space Tree



**Region 1** with mapping

**Region 0** with mapping A

