# HotChips 2009 Xeon Socket Filler FPGA Accelerators www.nallatech.com ## Intel Xeon Accelerator Modules ## Intel Front Side Bus FPGA Accelerators The Industry's only Xilinx Virtex-5 FSB Accelerator Module - 64-bit 1066MHz FSB interface - 8GB/s peak bandwidth - 105ns host latency - 256GB direct system memory access - Intel MP platform compatible - Modular product optimization for different applications - Xilinx Virtex-5 FPGA technology - Supported by Intel QuickAssist AAL - C → FPGA compiler support ## **Intel QuickAssist Initiative** - What is Intel QuickAssist? - Comprehensive initiative that enables optimized use and deployment of accelerators (primarily FPGAs) on Intel platforms - QuickAssist Accelerator Abstraction Layer (AAL) - Standard C/C++ API for inclusion with user application - Device discovery support - CPU initiated data transfer to FPGA (send data to accelerator) - CPU initiated data transfer from FPGA (receive data back from accelerator) #### Benefits - Common software interface supporting multiple processing technologies - Easy migration between different technologies and form factors - Transparent Open source approach ## Intel's Quickassist Accelerator Model Common function library over FSB, QPI, PCIe or traditional I A based algorithms ## Integrated Development Platform - Pre-installed FPGA hardware - Linux operating system - Intel QuickAssist AAL - VHDL IP cores - Reference designs - Documentation - 1 year warranty - 1 year technical support - Optional design service assistance ## Inside the 4U server # **FSB Configuration Options** ## **FSB-Base Module** - Intel Xeon mPGA604 socket - Fits within Xeon heat sink footprint #### Front Side Bus interface - 64-bit 1066MHz - 8GB/s peak bandwidth\* - 105ns latency - Direct access to system memory - Encrypted FSB core #### Virtex-5 Interface/User FPGA - FF1738 package, 42.5mm<sup>2</sup> - LX110-3 FSB-Compute or FSB-Expansion 64-bit/1,066MHz Front Side Bus <sup>\* 8</sup>GB/sec for 2 cache-line bursts ## **FSB-Compute Module** #### Virtex-5 User FPGAs - Supports Largest LX or SX or FX FPGAs - Up to 207,360 LUT6's - Up to 384 DSP48's - Up to 1032 18Kbit Block Rams #### 4 independent banks of DDR-II SRAM - 2 banks per FPGA - Up to 8MBytes per bank - 2x 32-bit data buses - 8GB/sec total bandwidth - Total Off Module B/W = 25.6GB/s - Scalability - Ability to stack multiple FSB-Compute modules ## **FSB-Expansion Module** #### Virtex-5 User FPGA - Supports Largest LX or SX or FX FPGA - >> 1TOP Fixed Precision/Bit Manipulation - Up to 100GF Single Precision FP - Up to 40GF Double Precision FP #### 4 banks of QDR-II SRAM - Up to 16 MByte per bank - 16GB/sec total bandwidth ## 2 off-module GTP connectors - 10 lanes @ 3.125Gbps per connector - 20 lanes total = 62.5 Gbps total ## 2 off-module digital connectors - 40 pins per connector - Single-ended or LVDS I/O - E.g. For High Speed Video Capture - E.g. Ultra low latency, <20ns, point to point Comms 128 LVDS pairs = 12.8GB/sec - Intel Xeon Server Socket - 73XX MP Xeon Series Compatible - MPGA-604 Socket - Zero Insertion Force Socket, ZIF - An Interposer is required to be fitted to the ZIF Socket - This provides the primary mating interface for the FPGA module stack. - FSB-BASE module plugs directly into Intel Xeon socket - Deals with low level FSB interface - Referred to as the "Bridge" from the host to the user logic Heatsink fitted to FSB interface FPGA Copyright ©2009, Nallatech. - ISI high density Custom Interconnect - 1526 pin HILO Connector - 0.8mm pitch - Provides LVDS links to upper module(s) Fits onto connectors of FSB-BASE module - FSB-COMPUTE module #1 mates with connector - Heatsinks applied to user FPGAs Another ISI high density connector mates with FSB-COMPUTE #1 providing LVDS links to another upper module - FSB-COMPUTE #2 mates with the connector - Heatsinks applied to user FPGAs The final ISI high density connector provides another LVDS link The FSB-EXPANSION module mates with the connector, completing the stack of 5 Xilinx user FPGAs + FSB Heatsinks are fitted to user FPGA of FSB-Expansion module ## ■ The complete stack... - Raw Compute Performance - > 500GF SPFP - > 200GF DPFP - >> 5TOPs Integer / Bit Manipulation - Power Consumption - Up to 130 Watts Maximum - 24 Watts Max per FPGA/Memory - Stack is currently factory configured - Updating for Customer Configuration - Insertion / Extraction Tool - Calibration Software # Stack Level Functional Block Diagram # Thank You