# **PipeRench**: Power and Performance Evaluation of a Programmable Pipelined Datapath

Benjamin A. Levine and Herman H. Schmit

Dept. of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA USA 15213 blevine@cmu.edu, herman@ece.cmu.edu



# **PipeRench** = Reconfigurable Computing Device

Reconfigurable Computing Device A Computing Device which can be reconfigured for each different application that it runs, by changing the functionality of its hardware and the way that its hardware is connected.



was developed by students and faculty at Carnegie Mellon.



# Why Reconfigurable Computing?





# Why Reconfigurable Computing?

We want performance **and** adaptability:

Performance of an ASIC

Implement application as custom datapath to:

- Increase parallelism.
- Decrease memory traffic (through locality).
- Increase performance.
- Use less power.
- Adaptability of a CPU.

Completely reprogram as needed for new applications.



# Why NOT Reconfigurable Computing?

- FPGA design is more like HW than SW
   No real C to FPGA yet, so must use HDL
- FPGA configuration is fixed to one FPGA
  - Must redesign to gain performance on larger FPGAs
  - Can't use design on FPGA with fewer resources.
  - Compares poorly to SW for microprocessors:
    - No portability
    - No scalability







# Virtual Architecture

- Compile to virtual machine
  - Makes compilation easier
  - Compile from high-level language (DIL)
  - Binaries decoupled from specific hardware
  - Scalable / Re-usable
- Restrict the model of computation to pipelined datapaths
  - Makes virtual architecture possible
  - Simplifies compilation and programming



# **Pipelined Datapaths**





# **PipeRench** Fabric

A programmable, pipelined data path containing:

Processing elements Local interconnect Pass Registers Unbounded Depth





#### **Pipeline Virtualization**

Virtual Pipeline







#### **Pipeline Virtualization**



Since stripes are connected in a ring, data can always pass between adjacent virtual stripes in the physical fabric.



### **Performance Scaling**



2 Outputs / 6 Cycles 4 Outputs / 6 Cycles 6 Outputs / 6 Cycles



# **Chip Structure**













# **Functional Unit Architecture**



Functional Unit Output (8-bts)





# **Pass Register File**

- Two values can be read in each stripe.
- PE can write one new value to a single register.

From Previous Stripe





To Next Stripe

# **Pass Register File Operation**





# **Pass Register Problem**





Old register values cycle through fabric endlessly. Extra switching consumes power.









# **Implemented Hardware Design**

- Industrial Partner: ST Microelectronics
- Process Technology: 0.18 micron, 6 metal

3.65 million transistors
49 sq. mm die
120 MHz fabric operation
60 MHz I/O frequency
< 3W power</li>
Reconfigure entire fabric in 133ns.
Switch applications in 8ns.



# **Tile Layout**



# Fabric Layout

| 0   | 500                              | 1000                          | 1500  | 2000                                                 | 2500         | 3000          | 3500                                             | 4000                        | 4500                     | 5000                 | . 5443.5<br>ماليات                    |
|-----|----------------------------------|-------------------------------|-------|------------------------------------------------------|--------------|---------------|--------------------------------------------------|-----------------------------|--------------------------|----------------------|---------------------------------------|
|     |                                  |                               |       |                                                      |              |               | ur sur                                           |                             |                          |                      |                                       |
|     |                                  |                               |       |                                                      |              |               |                                                  |                             |                          |                      |                                       |
|     |                                  |                               |       |                                                      |              |               |                                                  |                             |                          |                      | 1                                     |
|     |                                  |                               |       |                                                      |              |               |                                                  |                             | Jan Hilan                |                      |                                       |
| 11_ |                                  |                               |       |                                                      |              |               |                                                  |                             |                          |                      |                                       |
|     | i par i                          | ilni ilini                    |       | na Hi pa                                             | - 14 geo.    |               |                                                  | 1                           |                          |                      |                                       |
|     |                                  |                               |       |                                                      |              |               |                                                  |                             |                          |                      |                                       |
|     |                                  |                               |       | gan <sup>1</sup> 86 gan.<br>Lan <sub>1</sub> 879 Lan | - Fair yare. |               | 1 ann 11 ann.<br>1 ann.                          |                             | gan. i bu gan.           |                      | 1 11 32 0                             |
|     |                                  |                               |       |                                                      |              |               |                                                  | 164   16                    |                          |                      |                                       |
|     |                                  | Lina di Lina                  |       | and in the and                                       | - 16 yar. 0  |               | 1 yan, 1 1 yan,<br>1 yan, 1 1 yan, 1             |                             |                          |                      |                                       |
|     |                                  |                               |       |                                                      |              |               |                                                  | 161 <u>–</u>  16<br>107≡ 16 |                          |                      |                                       |
|     | ali an a<br>ali an a             | an shi an<br>Mar shi ar       |       | ana initiana.<br>Ang siyi ang                        | - 10 gan     |               | i dag. (* i dag.<br>7 Gag <sup>.</sup> : P. Gag. | alkilan ak                  | ann Albann<br>Mar Albann | altina a<br>altina a | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 |
|     |                                  |                               | r≓• P |                                                      |              |               |                                                  |                             | きにに言う                    |                      |                                       |
|     |                                  | an an i shi an<br>Tar tar tar |       | pan Pan pan.<br>Cas pan Cas'                         |              |               |                                                  |                             |                          |                      |                                       |
|     |                                  |                               | Ē     |                                                      |              |               |                                                  |                             |                          |                      |                                       |
|     |                                  |                               |       | kar s F kar                                          | a fi har     |               |                                                  |                             |                          |                      |                                       |
|     | 12 52 47 82 51<br>18 52 51 52 51 | 21 28 28 42 11 A              |       | 12 X1 12 22 24                                       |              | 52 32 32 30 1 | 14 28 28 49 58 28 1                              | 2. 20 20 X. 01              | C? 52 51 51 51           | 20 30 St St          | 12 AS 25 GB AL                        |

# **Chip Die Shot**



# **Performance on Filtering**

- 40 Tap 16-bit FIR Filter
   41.8 MSPS
- Comparable to high-end DSPs
  - Much lower clock rate
  - Without a full multiplier
    - (taps are compiled into hardware)



# **Performance on Encryption**

- IDEA Encryption: 450 Mbps
  - Key is compiled into hardware
  - Compilation (including P&R) takes less than one minute
- Comparison:
  - 800 MHz Pentium III Xenon: 75.4 Mbps



# **Power Consumption – FIR Filter**



- Before 14 taps, near constant power
- At 14 taps, virtualization causes step



# Conclusions

- A practical virtual machine for pipelined programmable datapaths is possible.
- Virtual hardware  $\Rightarrow$  physical hardware:
  - Completely self-managed on chip at run-time.
  - Enabled by fast incremental reconfiguration.
- Virtual architecture allows:
  - Easier compilation
  - Forward compatibility / Scalability
- Implemented chip has high performance and low power requirements

