# Adaptable Intelligence The Next Computing Era

Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx





### **Pervasive Intelligence from Cloud to Edge to Endpoints**





### **Exponential Growth and Opportunities**



**EXILINX**.

## **Challenges: The End of Moore's Law and Scaling**



Source: John Hennessy and David Patterson, Computer Architecture: A Quantitative Approach, 6/e 2018

### **Challenge: Exponential Power Density Growth**



Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software Co-Design, Enhanced Security, Open Instruction Sets, and Agile Chip Development

Power consumption based on models in "Dark Silicon and the End of Multicore Scaling" Hadi Esmaelizadeh, *ISCA*, 2011



### The Third Wave: Domain Specific Architectures on Adaptable HW



### Massive Scale Out Requires DSA's and Adaptable Platforms



**E** XILINX.

**The Innovation to Deployment Acceleration Imperative** 



Source: Scopus

## **Application Acceleration with DSA's on FPGA Platforms**

### **FPGA's Accelerate Entire Application**

### Hyperscale Data Centers with FPGA Accelerators

**FPGA** Accelerator

Search (Database + ML)

Speech (ML) Workloads

Security Workloads

**FPGA** Plane

**CPU** Plane



### **Development for DC Compute, Storage, Network Apps**



### **Stack for Application Acceleration including ML**



## **Cloud: Latency-sensitive High Resolution Imaging**

#### **CPU/GPU**



#### **2.7X Faster and Lower Power**

### **Cloud: Security / Anomaly Detection**



#### **5X Lower Latency with High Security**

### **Cloud: Smart City / Security**

#### **CPU/GPU**

**CPU/Xilinx FPGA** 

CPU



PCle

CNN

Motion Analysis

H.264 Decode

Custom Operation

**FPGA** 

- > H.265 Decode: 4x 1080p Streams (P4)
- > OpenCV: Pre-Processing/Motion Analysis on CPU
- > CNN: Object Detection on GPU
- > Data Sharing Between CPU and GPU

> H.265 Decode: 4x 1080p Streams (P4)

> OpenCV: Up to 20x Higher Performance

#### **CPU/GPU** Results



#### **10x Lower Latency and Lower Power**

> I ower Power



### **Xilinx Programmable Architecture Milestones**



### From FPGA to Adaptive Compute Acceleration Platform

#### New Device Category for Adaptive Workload-specific Acceleration

- > HW/SW Programmable Engines
- > IP Subsystems and a Network-on-Chip
- > Platform Offerings for Compute / Storage / Networking



### **Adaptive Compute Acceleration Platform**

- > Dynamically Adaptable to Workloads
- > Exponential Increase in Acceleration
- > Software Programmable



### **Adaptive Compute Acceleration Platform**

- > 20x ML Inference Performance
- > 4x 5G Communications Bandwidth
- > 112G Transceivers



### **New HW/SW Programmable Architecture**

#### Application-level Performance Enabled by SW Programmable Engine



#### **Compute Efficiency**

- > Domain Specific Engine
- > Greater Compute Density
- >Xilinx 7nm Everest

#### **Multiple Applications**

- > ML Inference for Cloud DC
- > Wireless 5G: Radio, Baseband
- > ADAS/AD Embedded Vision
- > Wired: DOCSIS Cable Access



#### **Array Architecture**

#### Heterogenous Architecture SW Programmable

- > SW Programmable (e.g., C/C++)
- > Compile, Execute, Debug
- Increased Productivity

> PL Flexibility

> PE Throughout and Efficiency

> Customized Memory Hierarchy

## **xDNN: Adaptable Overlay DNN Processor for Xilinx FPGA**



Low Latency & High Throughput (Batch = 1)

### **No FPGA Expertise Needed**

### **Soft Overlay Architectures for ML**



#### **DeePhi Aristotle Architecture**





# Building the Adaptable, Intelligent World