# Design of a High-Density SoC FPGA at 20nm

Brad Vest, Sean Atsatt, Mike Hutton Altera, San Jose



### **Arria 10 Device Outline**

- Device Goals and Overview
- Routing and Logic Architecture
- Transceiver and I/O Architecture
- DSP Block and Floating Point
- Hard Processor System (HPS)
- Power
- Summary



### **Device Goals**

# Mid-Range FPGA: balance of performance/power/cost targeting Key Market Applications

| Wireless<br>Infrastructure                                                                                               | Access, Metro &<br>Core                                                              | Transmission                                  | Cloud Servers<br>and Storage                                                         | Broadcast                                                                                                                     |  |
|--------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|-----------------------------------------------|--------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|--|
| Target Application                                                                                                       |                                                                                      |                                               |                                                                                      |                                                                                                                               |  |
| Remote Radio Head     Mobile Backhaul     Active Antenna     Basestation (BTS)     4G/LTE Marco eNB     4G/LTE Micro eNB | 40G GPON, EPON,<br>FFTH, Switch     100G / 200G NGPON     100G Traffic<br>Management | • NX 100G OTU 4<br>• 2 X OTU 4<br>• 4 X OTU 4 | <ul> <li>Flash Cache</li> <li>Cloud</li> <li>Server</li> <li>Acceleration</li> </ul> | <ul> <li>Pro A/V Equipment</li> <li>Switcher</li> <li>Server</li> <li>Transport</li> <li>Head End</li> <li>VoD Mux</li> </ul> |  |

### Key Targets and Metrics:

- 491 MHz fixed-point DSP datapath for Wireless RRU
- 1M+LEs at 350 MHz for 4xOTU4 (400G) OTN networks, with Partial Reconfig
- Cloud Server Acceleration Hardened Floating-Point
- 28G transceivers to support 200G to 400G networking/routing
- Dramatic die-size reduction



### **Overview and Floorplan**



# TSMC 20SOC Process

– 5.3B Tx, 11LM

# Resources

- 1.15M LEs, 1.7M FFs
- 64Mb embedded SRAM
- 32 fPLL, 16 PLLs, 32 GCLK
- 1.5 TFlops IEEE754 DSP
- Dual-Core ARM A9
- Row-based redundancy

# I/O

- 28G SERDES, >1.7Tb b/w
- x72 2.667Gbps DDR4 w/ Hard memory Controller
- Hardened PCIe/ILKN/10GE



# **Design Challenges**

### Process

- More restrictive design rules double patterning for lowest metal layers
  - Took significant advantage of structured custom layout to scale
- Increasing variation
  - Significantly more statistical analysis used on critical analog, memory, and sensing circuits to ensure robustness and manufacturability
  - Digitally assisted analog design
- Metal parasitics not scaling with transistors
  - Strategic metal planning to ensure critical signals get the lowest RC paths through the power mesh

### Clock latency and insertion

 Increased clock network flexibility to provide SW P&R more options for critical transfers though the FPGA fabric

# IP Integration

- Moved to a more modular tile based floorplan including embedded IOs
  - Provides significant area reduction, but requires more upfront planning of interfaces, metal grid, and feed-thru
  - Extended row based redundancy to work with embedded IOs



# **ALM registers and dedicated logic**



- Many high-performance designs have FF:LUT > 1
- Providing 4 FFs and 8 inputs to an ALM complex allows for more efficient packing
- ALM retains most Stratix V features:
  - Ternary add, shared LUT-mask, 20b carry-skip



# **Architectural improvements on Arria 10 Core**



Column based CRAM CRC on top of Row Based Up to 100x Faster Error Detection and Correction



Tri-state long-lines (V27, H32)

Maintain long wire performance despite poor metal process scaling



### **Arria 10 Transceiver Overview**



Support for a wide range of protocols, data rates, and applications



# **Arria 10 Transceiver Overview**

### Wide Range of Data Rates

- 611 Mbps 28.1 Gbps (Native)
- Down to 125 Mbps (Oversampling)

### High Transceiver Density

### Notable improvements

- 5 tap Transmitter pre-emphasis
- Adaptive CTLE (Continuous Time Linear Equalizer)
  - High Gain & High Data Rate Modes
- Adaptive DFE (Decision Feedback Equalizer)
  - 7 tap fixed, 4 tap floating
- Hard Forward Error Correction (FEC)
- Total Equalization capability > 30db

| Feature                   | Arria 10  |
|---------------------------|-----------|
| Transceiver Count         | Up to 96  |
| Max Data Rate (Select Ch) | 28.1 Gbps |
| Max 28G Channels          | Up to 16  |
| Max Data Rate (All Ch)    | 17.4 Gbps |
| Max Backplane Data Rate   | 17.4 Gbps |

| fPLL    | ſ      | Transceiver PMA | PCS |
|---------|--------|-----------------|-----|
| ATX PLL |        | Transceiver PMA | PCS |
|         | ETWOR  | Transceiver PMA | PCS |
| fPLL    | LOCK N | Transceiver PMA | PCS |
| ATX PLL |        | Transceiver PMA | PCS |
|         |        | Transceiver PMA | PCS |



# Hardened Calibration – Digitally Assisted Analog

2 Hard micro controllers (uC) handle all transceiver calibration on the device

# Key Advantages

- Enable calibration before FPGA core is programmed
  - Critical for Configuration via Protocol (CVP)
- Earlier generations required customers to instantiate increasing number of soft IPs
  - Converges prior generation soft controllers and hard state machines into one highly flexible system
- Ability to access firmware from application layers
- Enables advance calibration techniques and ondie instrumentation capabilities





# **PMA Tx Jitter Compensation**

#### Replicates clock pattern to act as noise compensation

- Avoid hitting PDN resonance (<100MHz)</li>
- Add switching current using duplicate pre-driver path during every non-transition bit to eliminate mid frequency noise

#### Reduces PDN induced jitter by 80%

- Achieves same result as adding capacitance that would increase XCVR area by <u>50%</u>
- Average power increases slightly, maximum power does not



### **GPIO and EMIF**

# Hardened memory controller (HMC) for DDR

- Programmably ganged to multiple memory interfaces

# IOAUX per column – managed by hard-uC





# Hardened Floating Point DSP

# Hardened IEEE 754 Floating Point adder & Multiplier

- 12% DSP Area increase (<<1% die area)

# 100% Fixed Point backwards compatible

- No performance or power penalty
- 'Have your cake and eat it too'
- How is this possible?
  - Overlaid FP algorithms on Fixed point circuits



MEAS RABLE ADVANTAGE

# Major Innovation – Hard Floating Point on a Commercial FPGA

# DSP Block – 1000s of blocks at very low latency

#### 1.5 TFLOPS of aggregate computation; 50 GFLOPS/W

- 1678 blocks @ 2 FLOPS/clock @ 450 MHz = 1.520 GFLOPs
- Can run individually or as large integrated DSP system

#### Hardware recursive structure support (Vector Mode)

- 10s/100s of DSP blocks can be seamlessly integrated
- Internal/External pipeling of individual DSP elements

### Very small latency

- Floating Point used for iterative algorithms require small latency
- Arria 10 Floating Point 256 length dot products ~ 25 clocks
- Standard FPGA Technology 256 length systolic FIR filter ~750 clocks





# Arria 10 HPS: Faster, Secure and SW Compatible

- Arria 10 SoCs feature a Dual Core ARM Processor for
  - Communications processing, acceleration, host offload, deeply embedded processing, and FPGA management
- Faster
  - Up to 1.5 GHz per core, total 7500 MIPS
- More Secure
  - Secure Boot with EC DSA Authentication
  - Root of Trust Support (Certification Authority)
  - Hierarchal Public Key Infrastructure
- Software Compatible
  - Extensive reuse of software, OS/BSP, tools reuse with 28nm SoC





## **Arria 10 HPS Peripheral Feature Summary**

### EXTERNAL MEMORY CONTROLLERS

- Hard Memory Controller (HMC)
  - DDR4/3, LPDDR2/3



Larger

- Up to 72-bit DDR4
- Multiport Front End (MPFE) Scheduler interface to HMC sharable with core logic
- QSPI flash controller with SIO, DIO, QIO SPI Flash support
- NAND flash controller (ONFI 1.0 or later) with DMA and ECC support UPDATED
  - Added 16-bit Flash device support for higher throughput
- SD/SDIO/MMC controller 4.5 with DMA with CE-ATA digital command support DEDATED
  - Updated to eMMC for additional flexibility
- 256MB of Scratch RAM =

### COMMUNICATION CONTROLLERS

- <u>3x 10/100/1000 Ethernet media access control</u> (MAC) with DMA
  - Enables simultaneous ingress, egress and control
- 2x USB On-The-Go (OTG) controller with DMA
- 2x UART 16550 controller
- 5x l<sup>2</sup>C controller
  - 3 can be used by EMAC for MIO to external PHY
- 4x serial peripheral interface (SPI)
  - 2 Master, 2 Slaves

### **SECURITY & PERIPHERALS**

- Anti-tamper, Secure Boot, POF Encryption
   (AES) and Authentication (SHA) and Root of Trust Support
  - Secure boot will only execute code that is provable from a known source and is unmodified
- 54 Programmable general-purpose I/O
- 7x general-purpose timers
- 4x watchdog timers



# HPS + FPGA enables Hardware & Software Co-processing

### High throughput bridge to FPGA

- Can access header/packet data an order of magnitude faster than the typical PCIe latency

### Non blocking low latency bridge to FPGA

Simple accesses to the fabric

### Shared FPGA/HPS bridge with smart scheduler to DDR interface



17 © 2014 Altera Corporation—Public

# **Arria 10 Device Security Feature Types**

#### **Prevention**

- AES-256 Bitstream Encryption
- Key masked prior to storing
- DPA Resistance
- JTAG readback not allowed
- JTAG disable
- Factory Test-mode disable
- Tamper-Protection mode
- On-chip Oscillator

#### **Detection**

- JTAG monitoring
- Built-in SEU detection
- On-chip Temperature sensor
- On-chip Oscillator
- Unique Chip ID
- Secure Boot (Code
- Authentication)
- V<sub>BAT</sub> Under-voltage detection

#### Response

- JTAG disable
- Built-in SEU correction
- Chip-core zeroize
- Volatile Key zeroize
- And more!



# Arria 10 SoC Secure Key Storage

#### Every Certification results in 3 keys

A private key, a public key and a code signing key

#### Private Key Storage

This key remains on the server or laptop and is invisible

#### Public Key Storage

- The hash value of the public key is stored in the eFUSE memory of the Arria 10 device
- In manufacturing the eFUSE is blown by the Quartus II programmer so all devices in the field have to run authenticated software

#### Code signing Key

 Stored in the Flash from where the processor boots Private Key remains on a secure server or laptop



#### Public Key is stored in device eFuse memory





MEAS, RABLE ADVANTAGE

### **Arria 10 Power Saving Features**



Enables device to run at lower than nominal Vcc while retaining same performance level reducing static and dynamic power

Programmable Power Technology

Enables lower power transistors for nonperformance critical paths to reduce static power

Vcc PowerManager Lower operating Vcc to trade off performance to achieve lower total power



### **SmartVoltage ID Power Reduction**

- Allows FPGA to be operated at lower core Vcc while retaining same performance
- Reduces worst case static power
- Reduces average dynamic power consumption across distribution of devices
  - Lower OpEx
- Requires power system controller that can support tuned voltage

Reduce Static Power by Up to 35%



### **Programmable Power Technology**



Accelerate speed-critical paths while reducing power on non-speed critical paths Quartus II optimizes your design automatically, enabling high-speed logic only where needed Get performance where you need it, and reduced power everywhere else

### **Reduces Core Static Power by Up to 20%**



### Summary

- Arria 10 engineering focus was maintaining the right balance for a mid range device
  - Development cost, performance, power, unit cost
  - Focus on design efficiency through new methodologies
- Arria 10 supports key hard features attractive to the target markets
  - HMC, processor, security
- Hardened Floating point DSP feature is opening new markets for FPGAs





# **Thank You**



MEASURABLE ADVANTAGE"

C 2014 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, ENPIRION, MAX, MEGACORE, NIOS, QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark. Office and in other countries. All other words and logos identified as trademarks or service marks are the property of their respective holders as described at www altera combiguit