

# Telairity-1: A Real Time H.264 High Definition Video Architecture

Richard Dickson August 15, 2005

# **telairity**

## **Agenda**

- Applications
- Chip architecture
- I/O architecture
- **Processor architecture**
- Performance
- Technology
- Silicon status

## **Telairity-1 Target Application**

### Targeted to demanding video applications

- ♦ H.264 real time, main profile, high definition encoding
  - Video servers
  - Broadcast encoders & Transcoders
  - Video editing & authoring
  - Video conferencing
  - Security & Surveillance
- H.264 HD standard is replacement for MPEG2 HD
  - Potential to cut bit-rate in half with same quality, 20Mbps to 10Mbps
  - This reduction in bit-rate takes significant additional compute power in S/W modules:
    - More Motion Estimation options than MPEG2
    - Context Adaptive Entropy encoder (CABAC) for a 15% bit rate reduction over MPEG2



## **Telairity-1 Single Chip Architecture**

- Programmable loosely coupled MP in a single chip
  - ♦ 5 independent vector/scalar processors
  - ♦ 1 video controller
  - ♦ 1 DRAM controller, supports 5.3 GB/s I/O bandwidth
- Telairity-1 offers the smallest footprint & lowest cost for broadcast quality H.264 video compression

| PO | P1                  | Р2 |  |
|----|---------------------|----|--|
| Р3 | Video<br>Controller | P4 |  |
|    | DRAM<br>Controller  |    |  |
|    |                     |    |  |



## **Telairity-1 I/O Architecture**



### Single Vector/Scalar Processor Features

- 4-vector pipes with independent hardware
- Independent Scalar Unit
- 128 KByte on-chip vector SRAM
- 4 KByte vector SRAM data cache
- 8 KByte scalar scratchpad memory
- 32 KByte instruction cache



## **Telairity-1 Instructions**

### Scalar Instructions

- Three-address scalar load and store instructions
- ♦ Memory addressing
  - Register, indexed, offset
  - Byte, doublet (2-bytes), quadlet (4-bytes)
- ♦ Arithmetic
  - Signed, unsigned, saturating
- ♦ Data types
  - 8/16/32-bit integer

### Vector instructions

- Three-address vector load and store instructions
  - Vector length, vector starting address, chaining
- Memory addressing
  - Register, indexed, offset
  - Stride, skip
  - Byte, doublet
- ♦ Arithmetic
  - Signed, unsigned, saturating, carry in, mask
- ♦ Data types
  - 8/16-bit vector



### **One of Five Identical Processors on Telairity-1**





## **Scalar Unit Details**



9

# One Vector Pipe of A Four Pipe Unit



- •4 Vector pipes per processor
- •Data Paths per pipe
  - 4 reads, 2 writes
  - 2 loads,1 store
  - ♦ Issue in order
  - Out of order completion
- •16 Vector registers per pipe
  - Each vector register has 32, 16-bit elements
- •11 Functional units per pipe
- Adders
  - ♦ 8, 24-bit accumulators
- •MAC
  - ♦ 8, 40-bit accumulator



**Processor VSRAM** 



REAL - TIME HD

11



## Performance

### • Vector & Scalar operations

- Four independent vector pipelines per processor
- One single issue scalar pipeline per processor

### • Peak operations per cycle per processor

- ♦ 21, 16-bit operations per cycle
  - 8, 16-bit vector ops + 8 vector loads + 4 vector stores, 1, 32-bit scalar op

### • Peak operations per cycle per chip

♦ 105, 16-bit operations per cycle

### • Sustained operations per cycle per processor

- ♦ 666 sustained operations issued & completed in 40 cycles
- ♦ 16.65, 16-bit operations per cycle per processor
- Sustained operations per cycle per chip
  - ♦ 83, 16-bit operations per cycle
- 668.25 MHz clock rate
  - ♦ 9x multiple of 74.25MHz SMPTE 20 bit video standard
- Total Sustained Chip performance of 55.5 GOPS/s



## H.264 Real Time HD Encoder Application: Comparison with other solutions

•Other Processors

- ♦ 16 to 20 DSP chips + 6 FPGAs
- ♦ 24 to 32 Multimedia chips + 6 FPGAs
- $\diamond$  10 to 12 x86 processors + 12 to 24 FPGAs
- Telairity-1
  - ♦ 4 to 8 Telairity-1 chips + 1 small FPGA
  - ♦ 668 MHz



## Application Program Profile for Real Time HD Encoding

| Telairity-1 H.264 programs      | 4 chips | 8 chips |
|---------------------------------|---------|---------|
| Motion Estimation               | 46%     | 55%     |
| DCT & IDCT                      | 3%      | 2%      |
| Loop Filter                     | 16%     | 8%      |
| Binarization & context modeling | 25%     | 10%     |
| Headroom                        | 10%     | 25%     |
| Total                           | 100%    | 100%    |

8 chips gives better quality or lower bit rate

## **Telairity-1 Chip Technology**

### Fujitsu 90nm CMOS technology

- ♦ 1.25 volt core, 1.8 volt I/O
- Die Size
  - ♦ 9.5mm x 14.4mm
  - ♦ 1156 FCBGA package
- Power
  - ♦ 15 Watts typical
- 88 Million Transistors
  - ♦ 41 Million RAM
  - ♦ 47 Million logic transistors



## **Chip Die Plot**





## **Telairity-1 Silicon Status**

- Chip taped out February '05
- First silicon May '05 and fully functional
- Speed
  - ♦ 668.25 MHz
- Software development system
  - $\diamond$  4 chips
  - ♦ Software tools
- Encoder development system
  - $\diamond \qquad 8 \ chips$
  - Encoder application software
  - ♦ Software tools
- Availability
  - Engineering Samples now
  - ♦ Production Q4 2005

www.telairity.com