



# The VelociTI<sup>TM</sup> Architecture of the TMS320C6xxx

### Hot Chips IX Symposium August 24-25 1997

#### Loc Truong DSP Applications Manager

hotchips.ppt 08/24/97 © Copyright Texas Instruments Inc.





Agenda







### Motivation For Quantum Leap in Programmable DSP Performance

- Emerging demands for mainstream applications of massively parallel, uniform DSP processing in wireless or wireline communications
- Economic and engineering necessity to reduce system size and power consumption
- Rapid improvements in DSP algorithms, quickly evolving standards and shorter time-to-market
- ⇒ VelociTI<sup>TM</sup> Advanced VLIW architecture opens up <u>unlimited possibilities</u> for *high-performance multichannel, multi-function applications* by delivering <u>10x performance</u> over existing Digital Signal Processors

hotchips.ppt 08/24/97 © Copyright Texas Instruments Inc.









hotchips.ppt 08/24/97 © Copyright Texas Instruments Inc.





#### **VelociTI<sup>TM</sup> Advanced VLIW Architecture** Why VLIW?

- VLIW lends well to DSP algorithms and offers possibilities for very high performance
- ◆ VelociTI<sup>™</sup> capitalizes on VLIW strengths while addressing its shortcomings with:
  - <u>high silicon densities</u> to improve speed paths through functional units
  - <u>architectural innovations</u>: flexible addressing modes, intelligent memory and peripheral support, flexible instruction packing and critical-path pipelining schemes
  - <u>compiler-friendly</u>: orthogonal, deterministic, 100% conditional RISC-like instruction set
  - <u>advanced</u> compiler and optimization <u>technologies</u>

hotchips.ppt 08/24/97 © Copyright Texas Instruments Inc.





### VelociTI<sup>TM</sup> Advanced VLIW Architecture

#### Core Block Diagram



#### THE WORLD LEADER IN DSP SOLUTIONS



6

## VelociTI™ Advanced VLIW ArchitectureFirst Offering: TMS320c6201



hotchips.ppt 08/24/97 © Copyright Texas Instruments Inc.

-U

- ◆ 1,600 MIPS @200-MHz
- ▶ 5 ns internal cycle time
- Up to eight 32-bit instructions per cycle
- ♦ 3.3V I/O, 2.5V Internal
- 0.25 micron, 5-layer metal
- One Megabit On-Chip RAM
- SRAM, SB-SRAM, SDRAM Interface
- Four-Channel DMA
- Two MultiChannel T1/E1 Serial Ports
- ♦ 16-bit DMA Host Port
- ◆ 352-pin BGA



# VelociTITM PushesNew Levels of DSP Performance



hotchips.ppt 08/24/97 © Copyright Texas Instruments Inc.







#### **VelociTI<sup>TM</sup> - New Levels of DSP Performance**

| Algorithm                              | <i>C6x</i><br><i>@ 200MHz</i> | Typical DSP<br>@ 60 MHz | 'C6x vs. Typical<br>Ratio |
|----------------------------------------|-------------------------------|-------------------------|---------------------------|
| FFT (256-point)                        | 14.0 us                       | 199 us                  | 14:1                      |
| DCT (8x8)                              | 1.14 us                       | 15.3 us                 | 13.4:1                    |
| Viterbi – IS54 (89 terms)              | 29.5 us                       | 315 us                  | 10.7:1                    |
| LMS Filter (24 tap)                    | 0.21 us                       | 1.9 us                  | 9:1                       |
| IIR Filter (8-biquads)                 | 0.15 us                       | 1.3 us                  | 8.9:1                     |
| FIR Filter<br>(24-tap, 64 data points) | 3.9 us                        | 31 us                   | 8:1                       |

New generation of tools use

- advanced scheduling strategies to achieve up to eight instruction in parallel every cycle
- *software pipelining techniques* to generate code that can execute multiple iterations of loops in parallel.

hotchips.ppt 08/24/97 © Copyright Texas Instruments Inc.



#### **VelociTI<sup>TM</sup> - New Levels of DSP Performance**

#### 'C6201 Multi-channel Application - Data Flow



hotchips.ppt 08/24/97 © Copyright Texas Instruments Inc.

-N







## **VelociTI<sup>TM</sup> - New Levels of DSP Performance** 'C6201 Multi-channel Application - Performance

•VLIW Signal Processing performance brings up to 40 channels of vocoders (ADPCM and Line Echo Cancellation) or 80 channels of **ADPCM on a single-chip programmable DSP!** 

#### **32K-bit ADPCM Implementation Statistics:**

<u>Program Memory</u>: 8328 bytes (G.721 vocoder and LEC) Data Memory: 39 KB total (256-tap+256-coeff) \* 32 channels \* 2 bytes/word = 32KB for LEC Cycles: Real time processing between samples provides 25K cycles/125 us. Cycles for vocoder: 9.5K cycles per sample of 32 channels 10.5K cycles per sample of 32 channels Cycles for LEC: <80% of 200MHz 'C6201 20Kcps Total:

hotchips.ppt 08/24/97 © Copyright Texas Instruments Inc.





#### **Advanced Development Tools**



TI has shifted the DSP development paradigm from a hardware to software focus by supporting the programmable, high-performance 'C6x DSP with **ultra-efficient**, **new-generation optimization tools**.

hotchips.ppt 08/24/97 © Copyright Texas Instruments Inc.

Slide 12



TEXAS INSTRUMENTS



#### Advanced Development Tools Code Generation Flow



- Automated code generation handles scheduling complexities of traditional VLIW
- Tool suite support optimizing through an iterative programming process:
  - Use C Compiler to optimize and S/W pipeline
  - Use Assembly Optimizer to automatically schedule and optimize serial assembly code
  - Debug code through intuitive Windows-based source code (C and Assembly) debugger
- Optionally hand optimize only most critical functions

hotchips.ppt 08/24/97 © Copyright Texas Instruments Inc.



#### Advanced Development Tools 'C6201 Compiler Efficiency (ver 1.0)

Cycle Counts for Unmodified C Benchmark Results



Cumulative Cycles of 8 Typical DSP Benchmarks (Data Courtesy EDN)

hotchips.ppt 08/24/97 © Copyright Texas Instruments Inc.

Slide 14



TEXAS INSTRUMENTS



THE WORLD LEADER IN DSP SOLUTIONS





◆C6201 will be manufactured with TI*meline*<sup>TM</sup> 0.18 micron process 1H98
◆Higher speed version (250+ MHz) and derivatives to be expected in near future
◆Higher levels of integration and peripheral mix using ASIC-flow are forthcoming



### Summary

#### <sup>•</sup>C6x VelociTI<sup>™</sup> Advanced VLIW enables:

- Delivering <u>10x performance</u> of any DSP on the market today.
- Shifting development paradigm from a hardware focus to a software focus.
- Establishing VelociTI Advanced VLIW as the <u>architecture of choice</u> for highperformance, low-cost DSP solutions.
- Reducing <u>development time</u> by half with new-generation tools designed for greatest ease of use and maximum optimization.
- Reducing system cost by half for multi-channel/multi-function applications.
- Opening the future to <u>endless possibilities</u> in real-time voice and data communications.

hotchips.ppt  $\,$  08/24/97  $\odot$  Copyright Texas Instruments Inc.

