# Introducing 28-nm Stratix V FPGAs: Built for Bandwidth

Dan Mansur Sergey Shumarayev August 2010



# **Market Dynamics for High-End Systems**

Communications



Mobile Internet driving bandwidth at 50% annualized growth rate

- Fixed footprints
- Existing power ceilings
- 40G/100G system deployment with 400G on the horizon

**Broadcast** 



Worldwide proliferation of HD/1080p

- Move to digital cinema and 4K2K
- Fixed power budget

Military



Heightened intelligence and defense needs

- More sensors, higher precision driven to decision points faster
- Power and uptime critical

Computer and Storage



Higher bandwidth, performance and lower latency

- Power consumption affects total cost of ownership
- Cloud computing driving up bandwidth



# Stratix V FPGA Family on 28-nm Process

- Stratix V FPGAs are built on TSMC's high-performance 28-nm HKMG process
  - Optimized for low power
  - ABB with core voltage 0.85V
- Ideal choice for devices used in nextgeneration, high-bandwidth systems
  - 35% higher performance than alternative process options
  - 30% lower total power versus previous generations
  - Enables fastest and most power-efficient transceivers





#### Stratix V FPGAs - Built for Bandwidth

#### Bandwidth

- 66 transceivers capable of 12.5 Gbps and 6 x72 800-MHz DDR3 interfaces
- Devices with 28-Gbps transceivers

#### Integration

- Embedded HardCopy Blocks supporting PCI Express Gen3 and 40G/100G Ethernet
- High-performance, high-precision DSP
- Enhanced logic fabric with 1,100K LEs,
  50 Mb RAM, and 3,510 18x18 multipliers

#### Flexibility

- Fine-grain and easy-to-use partial reconfiguration
- Configuration via PCI Express
- 50% higher system performance and 30% lower total power





# **Stratix V Family Plan**

|                      |        | Interconnect                 |      | Hard IP        |                 | Core Fabric          |       |                                  |                |       |
|----------------------|--------|------------------------------|------|----------------|-----------------|----------------------|-------|----------------------------------|----------------|-------|
|                      | Device | Transceivers<br>(12.5G, 28G) | GPIO | 72-bit<br>DDR3 | x8 PCle<br>Gen3 | 40G/100G<br>Ethernet | LEs   | Memory<br>M20K<br>(Mb / #Blocks) | 18x18<br>Multi | fPLLs |
| Stratix V GT<br>FPGA | 5SGTB5 | 32, 4                        | 597  | 4              | 1               | Yes                  | 425K  | 45 / 2304                        | 512            | 24    |
|                      | 5SGTB7 | 32, 4                        | 597  | 4              | 1               | Yes                  | 622K  | 50 / 2560                        | 512            | 24    |
|                      | 5SGXA3 | 36, 0                        | 624  | 4              | 1 or 2          | Yes                  | 200K  | 20 / 1034                        | 376            | 24    |
| Stratix V GX<br>FPGA | 5SGXA4 | 36, 0                        | 624  | 4              | 1 or 2          | Yes                  | 300K  | 26 / 1316                        | 376            | 24    |
|                      | 5SGXA5 | 48, 0                        | 840  | 6              | 1 or 4          | Yes                  | 425K  | 45 / 2304                        | 512            | 28    |
|                      | 5SGXA7 | 48, 0                        | 840  | 6              | 1 or 4          | Yes                  | 622K  | 50 / 2560                        | 512            | 28    |
|                      | 5SGXB5 | 66, 0                        | 648  | 4              | 1 or 4          | Yes                  | 404K  | 36 / 1836                        | 612            | 24    |
|                      | 5SGXB6 | 66, 0                        | 648  | 4              | 1 or 4          | Yes                  | 534K  | 39 / 1989                        | 612            | 24    |
| Stratix V GS<br>FPGA | 5SGSB7 | 27, 0                        | 1032 | 7              | 1 or 2          | No                   | 563K  | 32 / 1620                        | 3,240          | 22    |
|                      | 5SGSB8 | 27, 0                        | 1032 | 7              | 1 or 2          | No                   | 706K  | 34 / 1755                        | 3,510          | 22    |
| Stratix V E<br>FPGA  | 5SEB9  | -                            | 900  | 7              | -               | No                   | 968K  | 33 / 1596                        | 1,064          | 32    |
|                      | 5SEBA  | -                            | 900  | 7              | -               | No                   | 1087K | 43 / 2100                        | 1,100          | 32    |



#### Increased Efficiency and System Performance



#### New ALM architecture

- Higher logic efficiency and performance
- 800K additional registers on largest device
- Ideal for heavily pipelined and register-rich designs

#### New M20K block and MLAB

- Improved area efficiency and higher system performance
- Up to 53 Mbits embedded RAM
- New fPLLs high resolution clock synthesis
  - Replaces board-level clock frequency sources (VCXOs) and reduces clock pins

#### Enhanced routing

Easier timing closure and higher utilization



#### **Power Techniques**

| Power Reduction Methods                             | Lower static power | Lower dynamic power |
|-----------------------------------------------------|--------------------|---------------------|
| 28-nm process changes                               | ✓                  | ✓                   |
| Low power transceivers (200mW @ 28 Gbps)            | ✓                  | ✓                   |
| Programmable Power / Adaptive Body Bias             | ✓                  |                     |
| Lower core voltage (0.85V)                          | ✓                  | ✓                   |
| Extensive hardening of IP, Embedded HardCopy Blocks | ✓                  | ✓                   |
| Hard power down of functional blocks                | ✓                  | ✓                   |
| Clock gating                                        |                    | ✓                   |
| Customized extra-low leakage devices                | ✓                  |                     |
| Partial Reconfiguration                             | ✓                  | ✓                   |
| DDR3 and dynamic on-chip termination                | ✓                  | ✓                   |



### New Embedded HardCopy Block





#### Flexible Transceiver Architecture

- Scalability and flexibility through a continuous bank of transceivers
- Complete PMA+PCS per channel
- Flexible clocking options with abundant transmit clock sources enabling up to 44 independent data rates

| Transmit Clock<br>Source | Number | Data Range<br>(Gbps) |  |
|--------------------------|--------|----------------------|--|
| 28G LC PLL               | 4      | 20 - 28              |  |
| 12G LC PLL               | 22     | 3.25 - 12.5          |  |
| CMU PLL                  | 22     | 0.6 – 12.5           |  |
| Core PLL (fPLL)          | 22     | 0.6 - 3.75           |  |





# **Stratix V Integrated Hard IP**

| Embedded HardCopy Block Hard IP |                                                                                                                                               |  |  |
|---------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| x8 PCIe Gen3                    | PCS, PHY/MAC, data link, transaction layer                                                                                                    |  |  |
| 40GE/100GE                      | MLD/PCS – gearbox, block sync, alignment marker, reorder virtual channel, async buffer/deskew, block striper/destriper, scrambler/descrambler |  |  |

| Transceiver PCS Hard IP |                                                                                              |  |  |
|-------------------------|----------------------------------------------------------------------------------------------|--|--|
| Interlaken              | Gearbox, block sync, 64b/67b, frame sync, scrambler/descrambler, CRC-32, async buffer/deskew |  |  |
| 10GE (10GBASE-R)        | Gearbox, block sync, scrambler/descrambler, 64b/66b, rate matcher                            |  |  |
| SRIO 2.0                | Word aligner, lane sync state machine, deskew, rate matcher                                  |  |  |
| CPRI/OBSAI              | Word aligner, bit slip (deterministic latency)                                               |  |  |



### **External Memory Interface**

- New UniPHY enables half the latency of ALTMEMPHY
- High system reliability
  - Duty cycle correction
  - Calibration algorithms
  - VT compensated deskew delays
  - PVT tracking mechanisms
- Sharing of PLLs and DLLs across multiple interfaces
- Hard I/O FIFOs and read/write paths
- Ease of use
  - UniPHY available as cleartext
  - Nios processor-based calibration sequencer for easier debug and customization
  - Easy-to-use application of timing and pin constraints
  - Improved documentation

# Stratix V FPGA PHY Architecture (UniPHY)





# **Stratix V Transceivers**

August 2010



### **High-Bandwidth Transceivers**

#### 28-Gbps transceivers

- 20 Gbps to 28 Gbps
- Up to 4 full-duplex transceiver channels
- CEI-28G compliant

#### 12.5-Gbps transceivers

- 150 Mbps to 12.5 Gbps
- Up to 66 full-duplex transceiver channels
- SFP+ and 10GBASE-KR compliant

#### Independent transceivers

Change transceiver settings (PMA or PCS) without interrupting other transceiver channels

#### Overcome channel losses

- Ultra-low transmit jitter (LC PLL) and excellent jitter tolerance (analog CDR)
- Four signal-conditioning techniques to compensate for losses



#### **Backplanes and Optical Modules**

- Drive 40" backplanes at 12.5 Gbps
  - 10GBASE-KR compliant (IEEE 802.3AP Clause 72)
- Interface to optical modules directly
  - Built in electronic dispersion compensation (EDC)
  - XFP, SFP+, QSFP, and CFP compliance
- Signal conditioning
  - Pre-emphasis and de-emphasis
  - Four-stage continuous time linear equalizer (CTLE)
  - 5-tap decision feedback equalizer (DFE)
  - Adaptive dispersion compensation engine (ADCE)
- On-die instrumentation
  - Monitor eye margin within the receiver
  - Evaluate effectiveness of signal-conditioning techniques



### **Stratix V FPGA EyeQ Eye Viewer**

- Complete vertical and horizontal reconstruction of eye opening
- Uninterrupted data path for live debug capability
- Serial and parallel data verification for live in-system eye reconstruction
- Known pattern not necessary
- Evaluate effectiveness of signal-conditioning techniques
  - Select optimal pre-emphasis, CTLE, and DFE settings for largest eye opening







### **High Bandwidth at Low Power**

- Lower power 50% power reduction at 11.3 Gbps
- A fraction of the power (< 10%) compared to external transceivers
  - 28 Gbps ~200 mW per channel
  - 12.5 Gbps ~170 mW per channel
  - 6.5 Gbps ~ 80 mW per channel





### Transceiver Power at 28 Gbps/28 nm

| 28Gbps PMA Power (mW) | Post-LY       | Post-LY            |  |
|-----------------------|---------------|--------------------|--|
|                       | Base features | +Optional features |  |
| RX-CTLE               | 25            | 25                 |  |
| RX-DFE (est)          |               | 35                 |  |
| CDR                   | 98            | 98                 |  |
| Deserializer          | 27.8          | 27.8               |  |
| TX driver             | 30            | 30                 |  |
| TX-FFE (6 dB)         |               | 11                 |  |
| Serializer            | 20            | 20                 |  |
| TOTAL                 | 201           | 247                |  |
| FOM (mW/Gbps)         | 7.18          | 8.82               |  |

 Low-power design enables the 28nm transceiver achieves <= 8.82 mw/Gbps (pJ/bit) power FOM at 28 Gbps



### World's First 28nm Transceiver at 28 Gbps



Tx eye diagram measured from a 28nm chip



# **28Gbps Transceiver Physical View**





### RF Die-Package Design





#### **Signal Conditioning Working**

- Support up to 12.5Gbps data rate
  - For c2c, c2m and backplane
- RX Path (CTLE):
  - 4 EQ stages: up to 20dB programmable AC gain
  - Peaking is independently controlled to meet 6G and 12G BPs
  - Programmable DC gain of 3dB/6dB/9dB/12dB with 3dB/stage





(12.5Gbps)

#### 28-nm Transceiver Demonstration Board



© 2010 Altera Corporation—Hot Chips 22 2010

