# A CMOS Non-Blocking 16x16 Gigabit Ethernet Switch Chip

#### Jonas Alowersson,

Anders Edman, Henrik O. Johansson, Tomas Johansson, Anders Lloyd, Bertil Roslund, Lars-Olof Svensson, Patrik Sundström, Peter Tufvesson, Kenny Ranerup, Per Andersson, Christer Svensson

SWITCH CHRE

# Outline

- Background
- Architecture
- Combining Asic and FC design
- Memory design
- The high speed Serial-to-Parallel converter

SWITCH CIRE

- Coping with a multitude of clock regions
- External memory interfaces
- On chip classifier
- Facts and figures

# Background

#### **Research Project**

Cooperation between Linköping & Lund Universities Led by Christer Svensson and Per Andersson 80 Gbit/s single chip ATM switch in



80 Gbit/s single chip ATM switch in The BiCMOS using shared memory architecture

The Research Chip

#### **Challenge for Commercial Product**

Integrate high speed switching from research project with full support for switching and routing functions required in a full-blown Gigabit Ethernet switch.

SWITCH

#### Architecture



# Asic and Full Custom

 Full Custom design is used for buffer memories and other regular blocks, to enhance utilization and increase performance

Irregular blocks are designed with synthesized logic to decrease design time and increase reusability

 Full custom blocks account for 30 % of the chip area





Full custom blocks

SWITCH CIRE



External clock



SWITCH

#### Enable ternary circuit

Example of circuit level techniques in full custom blocks for reduced power consumption.



#### Memory design

- 1 Mbit of On-chip buffer memory
- Total data transmission rate of 76 Gbit/s, managed by combination of high access frequency, 150 MHz, and high access width, 512 bits.
- Single-ported memory core for minimum area.
- Dual-ported peripheral circuitry for simple interface to control logic.

# Memory block diagram



# Serial to Parallel Converter

SRAM-based structure performs three tasks:

- Parallelize data to 512
  bits width
- Multiplex incoming data streams
- Provide synchronization between link clocks and local system clock



# Coping with several clock regions

There are 19 clock regions on the chip

- One for each MAC
- Rambus clock
- Processor clock
- System clock

Transitions between regions are made where the data rate of each physical wire is low, e.g., in the S/P converter.

## External interfaces

The chip has four types of external interfaces:

- Link interfaces, support GMII/MII and TBI
- Processor interface, generic interface glueless to a Motorola PowerPC processor
- Rambus interface, Direct RDRAM, up to 512 Mbytes supported
- External CAM interface, proprietary high bandwidth bus.

# On chip Classifier

- Pipelined structure with a series of CAM table lookups, based on packet content up to certain layer 4 information.
- Provides means for wire speed filtering
- Provides classification of packets to be used by a bandwidth allocation algorithm implemented in hardware.



#### Wire-speed Dual scheduler

- Bandwidth and latency totally independent
- BW-scheduler in front of output queues
- Per-flow guaranteed bandwidth
- Per-flow maximum allowed bandwidth
- Per-flow weights

#### **INTEGRATED IN HARDWARE**

## Facts and figures

- 15x15 mm<sup>2</sup> die in 0.25  $\mu$  CMOS
- 836 pin EBGA package
- Power consumption, 13.5 W
- 2.5 V core supply voltage
- 3.3 V I/O, 5 V tolerant
- System clock frequency, 75 MHz
- Throughput, 16 Gbit/s
- 20 M transistors

#### **Chip Status**

#### This chip is a product and we are sampling to Core Alliance Partners now

