

## **Hybrid On-chip Data Networks**

#### **Gilbert Hendry**

Keren Bergman



Lightwave Research Lab Columbia University

# **Chip-Scale Interconnection Networks**



- Chip multi-processors create need for high performance interconnects
- Performance bottleneck of on-chip networks and I/O
- Power dissipation constraints of the chip package
  - > 50% of total power comes from interconnects\*



Intel Polaris





AMD Opteron

\* N. Magen *et al.*, "Interconnect-power dissipation in a microprocessor," SLIP 2004.

#### **Motivation**



- CMPs of the future = 3D stacking
- Lots of data on chip
- Photonics offers key advantages





#### Photonics changes the rules for Bandwidth, Energy, and Distance.

#### **ELECTRONICS:**

- Buffer, receive and re-transmit at every router.
- Each bus lane routed independently.  $(P \propto N_{LANES})$
- Off-chip BW is pin-limited and power hungry.

#### **OPTICS:**

- Modulate/receive high bandwidth data stream once per communication event.
- Broadband switch routes entire multiwavelength stream.
- Off-chip BW = On-chip BW for nearly same power.











#### Step 1: Path SETUP request





#### Step 2: Path ACK





#### Step 3: Transmit Data





#### Meanwhile: Path Contention





#### Step 4: Path TEARDOWN





#### Pros:

- Energy-efficient end-toend transmission
- High bandwidth through WDM
- Electronic network still available for small control messages\*
- Network-level support for secure regions

#### Cons:

- Path setup latency
- Path setup contention (no fairness)



# **Programming and Communication**

## **Shared Memory**





"... [OpenMP on large systems] often performs worse than message passing due to a combination of false sharing, coherence traffic, contention, and system issues that arise from the difference in scheduling and network interface moderation" ~ Exascale Report





| Access       | Method                              |
|--------------|-------------------------------------|
| Local Read   | Optical Receive                     |
| Local Write  | Optical send                        |
| Remote Read  | Electronic request, optical receive |
| Remote Write | Optical send                        |
| Shared R/W   | ?                                   |



[G. Hendry et al. Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC. In Supercomputing, Nov. 2010]

# **Message Passing**



- Complex, dynamic access patterns
- Relatively larger blocks of data
- Scientific computing →





\* [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009] <sup>15</sup>

## Streaming



- Embedded / specialized systems (Graphics, Image + Signal Proc.)
- Execution mode of general-purpose systems (Cell Processor)





#### **Electronic Plane**

### **Electronic Router**





• Narrow Channels (8-32)

#### **Network Gateway**





#### **External Concentration**

[P. Kumar et al. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009]



#### **The Photonic Plane**

## **Wavelength Division Multiplexing**







#### Silicon Photonic Waveguide Technology





## **Ring Resonator Operation**





#### Silicon Photonic Modulator and Detector Technology







[M Watts, Group Four Photonics (2008)]

[S Koester, J. Lightw. Technol. (2007)]

Receive circuit

### **Higher Order Switch Designs**









[A. Biberman, IEEE Phot. Tech. Letters (2010)]





# **On-Chip Topology Exploration**



• Photonic Torus

Nonblocking Photonic Torus



[M. Petracca et al. IEEE Micro, 2008]



[A. Shacham et al., Trans. on Comput., 2008]

# **On-Chip Topology Exploration**



TorusNX
Square Root



[J. Chan et al. JLT, May 2010]

#### **Photonic Plane Characteristics**



- Insertion Loss
- Noise
- Power

#### **Insertion Loss and Optical Power Budget**





#### **Insertion Loss vs. Bandwidth**





#### **Simulation Results**









Propagation Crossing Dropping Into a Ring



32

*Original* is based on the IL results from previous slide, *Improved* is based on a hypothetical improvement in crossing loss from 0.15 dB to 0.05 dB.



## **Photonic Plane Characteristics**



- Insertion Loss
- Noise
- Power

#### **Noise and Crosstalk**





#### **Effects of Noise**





**Simulation Results** 

#### <u>Results</u>

•Results are plotted for network size of  $8 \times 8$  at saturation, at the detectors.

- Maximum OSNR = ~45 dB (due to laser noise)
- Minimum OSNR < 17 dB (due to message-to-message crosstalk)

• Variations between networks due to varying likelihood of two message intersecting on network topology.

#### System Performance

• SNR measures the likelihood of error-free 10 transmission.

• Lower SNR designs will require additional 0 retransmission, resulting in lower throughput performance.



The line at OSNR=16.9 dB is where a bit-error-rate of  $10^{-12}$  can be achieved, assuming an ideal binary receiver circuit and orthogonal signaling.



## **Photonic Plane Characteristics**



- Insertion Loss
- Noise
- Power

#### **Power Usage**



- Laser Power
- Active Power
  - Modulating
  - Detecting
  - Broadband
- Static Power
  - Thermal tuning
- Tx\Rx Power
  - Drivers
  - TIAs











- Results based on randomly generated traffic with message sizes of 100 kbit, with network in saturation.
- Data was collected on 64 nodes topologies constrained to a total surface area of  $2 \text{ cm} \times 2 \text{ cm}$ .







#### Performance

#### Performance



- Uniform random traffic
- 256 cores, 64-node network



#### **Scientific Applications**









# **Other Interesting Issues**

#### **Memory Access**





[G. Hendry et al. Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC. In Supercomputing, Nov. 2010]

#### **Other Arbitration Means - TDM**





[G. Hendry et al. Silicon Nanophotonic Network-On-Chip Using TDM Arbitration. In HOTI, Aug. 2010]

### Wavelength Granularity





• Scalable number of WDM channels

## Conclusion



- Some applications / programming models definitely well-suited to a circuit-switched photonic network
- Interesting tradeoffs and design space
  - Photonic physical layout / design
  - System-level benefits from device improvement
  - Network-level improvements