

HotChips21 Session Six : SoCs + Clocking

## SoC for Car Navigation Systems with a 53.3 GOPS Image Recognition Engine

Aug. 25. 2009

Hideaki Kido<sup>\*</sup>, Shoji Muramatsu<sup>\*</sup>, Yasuhiko Hoshi<sup>\*\*</sup>, Hiroyuki Hamasaki<sup>\*\*</sup>, Atsuhi Nakamura <sup>\*\*</sup>, Akihiro Yamamoto <sup>\*\*</sup>

\*Hitachi Ltd., \*\*Renesas Technology Corporation



© Hitachi, Ltd. 2009. All rights reserved.



## Contents

- 1. Car Navigation Systems Requirement and New SoC
- 2. New Image Recognition Engine (IMP-X)
- 3. Practical Application with IMP-X
- 4. Summary



#### **Advanced Car Navigation Systems**

## Navigation



Driver friendly user interface and graphics



70km/h

Object recognition Passive/Active safety

Safety

### Amusement



**Digital TV display** 



Audio playback

Advanced car navigation systems will supply multi applications assisting and entertaining our driving

he car is approaching.

## **Integrated 1-Chip Solution**

#### Demanded Technology for Advanced Car Navigation Systems



#### Products Lineup for 1-Chip Embedded Car Navigation Systems



#### Next Generation 1-Chip Solution : SH-Navi3



#### Specification of SH-Navi3

|                        | SH-Navi2V (SH7774) | SH-Navi3 (SH7776)            |
|------------------------|--------------------|------------------------------|
| Technology             | 90nm               | 65nm                         |
| CPU                    | SH-4A              | SH-4A x 2                    |
|                        | 600MHz (1080MIPS)  | 533MHz x2 (1920MIPS)         |
|                        | FPU : 4.2 GFLOPS   | FPU : 7.46 GFLOPS            |
| Cache                  | I : 32KB, D : 32KB | I:32KB,D:32KB,L2:128KB       |
| Image recognition      | 38.4 GOPS engine   | 53.3 GOPS engine             |
| engine                 |                    | Distortion correction module |
| Graphics IP            | 2D accelerator     | 2D/3D accelerator            |
|                        |                    | PowerVR SGX *                |
| Video in /Display unit | 2ch / 1ch          | 3ch / 2ch                    |
| Temperature range      | -40 ~ 85° C        | -40 ~ 85° C                  |
| External memory        | DDR2 (DDR600)      | DDR3 (DDR1066) 2ch           |
| Transistor count ratio | 1.0                | 2.16                         |

\* Imagination Technology Ltd.

© Hitachi, Ltd. 2009. All rights reserved.

#### Automotive Safety System based on Image Recognition

#### **Current demands**

# Object recognition for avoidance of traffic accident

 vehicle, pedestrian, traffic sign, lane, etc...





These applications have to be processed in real-time

1-Chip Solution's Problem with Image Recognition

#### Real time application with image recognition consumes a lot of computational power



#### Concepts of Image Recognition Accelerator (IMP-X)

IMP-X accelerates frequency-used and simple functions
 CPU calculates the others or complicated functions
 IMP-X & CPU access same image memory region with same distance



9

#### **Process Flow of Image Recognition**



## For High-speed processing

- Various image functions
   up to 53.3 GOPS
- Distortion correct, changing view point

Local parallel processing
Pipeline processing

 Function specific accelerator (IMR)

For Integration into SoC

For Versatile processing

 Compact Architecture
 PIPE (Programmable Image Processing Extensions)

#### What functions does IMP-X accelerate ?

- Image Affine Transformation
- Pixel Transformation
- Inter Pixel Arithmetic Calculations
- Inter Pixel Logical Calculations
- Binary Image Shape Transformation
- Convolution
- → 0.66ms@VGA
- Minimum/Maximum Filter
- Rank Filter
- Labeling
- Gray Scale Image Characteristics
- Binary Image Characteristic
   Extraction
   1 18ms

→ 1.18ms@VGA

- Memory Access
- Binary Pipeline Filter
- Pipeline Control
- YUV Color Processing
- Binary Matching Filter
- Optical flow
- Template Matching (SAD, <u>Normalized Correlation</u> etc)
- Matrix operation
- FFT
- etc...

Normalized correlation is operated at <u>53.3 GOPS</u> (Max performance of IMP-X)

#### For High-speed Processing : Template Matching

Calculate similarity between template pattern(f) and part of image(g)

**Normalized Correlation** 

$$=\frac{\sum \sum (f(x, y) - \overline{f})(g(x, y) - \overline{g})}{\sqrt{\sum \sum (f(x, y) - \overline{f})^2} \times \sqrt{\sum \sum (g(x, y) - \overline{g})^2}}$$

Total Processing: (1+2+2+3) Operations \* 25 GSEUs \* 266 MHz = 53.3GOPS



#### For High-speed Processing : Distortion Correction

- Generating image pattern for distortion correction, rotation, zoom out/in
   Control by flexible triangle mesh with vertex data set
- YUV image format (combination/independent), Gray-scale format (8bpp)
- Hi-Speed drawing engine at 90fps with VGA



#### For Integration into SoC and Versatile Processing

Programmable processing at each line with micro-code control
 Reducing data traffic between IMP-X and memory by store and reuse of processed data in IMP-X

Comparison of 3 image-functions behavior between non-PIPE and PIPE



#### **Process Flow of Image Recognition**



#### Evaluation for Bus traffic reduction

Harris corner detector requires **14** functions

$$H = \begin{bmatrix} \sum_{x} I_{x}^{2} & \sum_{x} I_{x} I_{y} \\ \sum_{x} I_{x} I_{y} & \sum_{x} I_{y}^{2} \end{bmatrix} = \begin{bmatrix} a & b \\ b & c \end{bmatrix}$$
  
where  $I_{x} = \frac{\partial I}{\partial x}, \qquad I_{y} = \frac{\partial I}{\partial y}$   
$$det(H) = ac - b^{2}$$
  
 $Trace(H) = a + c$   
 $R = det(H) - kTrace(H)^{2}$   
**Corner**





#### Practical Application : Pedestrian Detection with Neural Network

Pedestrian Detection is one of the important applications for safety system
 Three-layered neural network is used for recognizing pedestrian pattern



#### With CPU ...

- 1600 times of neural network processing time : 204ms
- Total pedestrian detection application processing time : 233 ms

#### Practical Application : Acceleration of Neural Network

The neural network consists of two matrix product operation stages (1,3) and several mathematical transformation stages (2, 4).



#### With IMP-X ...

- 1600 times of neural network processing time :  $204ms \rightarrow 8.9 ms$
- Total pedestrian detection application processing time : 233 ms  $\rightarrow$  29.4 ms

## SH-Navi3 embeds

High performance dual RISC processors (1920 MIPS)

2D/3D graphic accelerators

 Image recognition engine
 High-speed processing (up to 53.3GOPS) : parallel processing + pipeline architecture + function specific accelerator
 Bus traffic reduction & Line programmability : PIPE architecture



Achieves a 1-Chip solution for Next-Generation Car Navigation Systems



## HITACHI Inspire the Next



© Hitachi, Ltd. 2009. All rights reserved. 21