# An Intelligent ADAS Processor with Real-Time Semi-Global Matching and Intention Prediction for 720p Stereo Vision

Kyuho J. Lee, Kyeongryeol Bong, Changhyeon Kim, and Hoi-Jun Yoo, Korea Advanced Institute of Science and Technology (KAIST)

# Motivation and Requirements of Intelligent ADAS<sup>1</sup>)

### < High-Performance ADAS Function >



- Several algorithms are executed simultaneously
- Must meet Real-time constraint (> 30fps)
- Global/Dense Stereo-vision is essential (SGM) is essentially required for high-accuracy depth map
- High resolution camera for high-accuracy detection (> 720p)
- Thermal-Design Power constraint due to absence of cooling fans (< 4W)

### High-Performance & Energy-Efficient

1) ADAS: Advanced Driver Assistance System 2) ACC: Adaptive Cruise Control 3) AEB: Autonomous Emergency Braking 4) ICE: Intelligent Collision Evasion

- the driver
- Detected objects should be *intelligently selected* & *provided* to the driver
- For advanced functions such as ACC<sup>2</sup>, AEB<sup>3</sup>, or ICE<sup>4)</sup>, *Behavior Analysis* is essential

### Objects' Intention-Prediction





### < Intention-Prediction for Selective Information >

#### **Typical Road Scene**





Numerous objects on the road, but providing excessive information to the driver disturbs driving Most objects are not risky, but only some are risky to





**Intention Prediction Processor** < **RNN-FIS** for Intention Prediction > <Object Detection Result> **Recurrent Output Memory** ∖ *z* = 6.7n Recurrent Recurrent *x* = 1.2m ↓ ∧ Feedback Input Fuzzy Rule Base Input  $\mathbf{S}_{t} = \{\mathbf{x}_{t}, \, \mathbf{z}_{t}, \, \Delta \mathbf{x}_{t}, \, \Delta \mathbf{z}_{t}\}$ **Recurrent Neural Network (RNN) Fuzzy Inference System (FIS) Matrix Processing Unit Fuzzy Accelerator** RISC <sup>I\$</sup> Ctrlr. D\$ MF1 MF2 MF3 MF4 Clock Gating Ctrlr. Fuzzy Decoder System Bus Configuration memory & Shard DAC/ADC Bank

Hot Chips: A Symposium on High Performance Chips, August 21-23, 2016



## E-mail: kyuho.jsn.lee@kaist.ac.kr

- Stereo Matching (SGM) for depth extraction
- Optical Flow for feature tracking
- *Region-of-Interest* generation to reduce computation
- 3D-world Mapping for ego-motion compensation & unit conversion
- Intention Prediction for behavior analysis
- Many SIMD/MIMD Core Architecture
- Different Parallelism
  - 1. High Pixel-Parallel Processing
  - 2. Moderate Pixel-/Task-Parallel Proc.
  - 3. Complex Task-Parallel Processing
- → DRMP for 3-domain DVFS control • ITS for workload-prediction & NoC BW Regulation
- *CAFeR* NoC<sup>[1]</sup> for network congestion reduction

[1] K. Lee, ESSCIRC 2015, "Intelligent Task Scheduler with High Throughput NoC for Real-Time Mobile Object Recognition SoC"

### < Chip Implementation and Spec. >

|   |                     |          |                    |                                 |                 | Process                   | brocess 65nm 1P8M Logic CMOS |             |
|---|---------------------|----------|--------------------|---------------------------------|-----------------|---------------------------|------------------------------|-------------|
|   | DRM                 | RGP      | Intention          |                                 |                 | Chip Size                 | 4.0mm x 4.0mm                |             |
|   | ITS                 |          | Prediction         |                                 |                 |                           | Nominal                      | 1.2V        |
|   |                     |          | Object Detection   | Ego-motion                      | notion          | Supply voltage            | DVFS                         | 0.65 - 1.2V |
|   |                     |          | Processor<br>(ODP) | Compensation<br>Processor (ECP) |                 |                           | Nominal                      | 250MHz      |
| H | Semi-Global         |          | Optical Flow       | Clock Frequency                 | DVFS            | 50 - 250MHz               |                              |             |
| 뭠 |                     |          | Global             | (OFP)                           |                 | Dewer                     | Average                      | 330mW       |
|   |                     | Matching |                    |                                 |                 | Power                     | Peak                         | 582mW       |
|   | Processor<br>(SGMP) |          |                    |                                 |                 | Peak<br>Performance       | 502 GOPS<br>862 GOPS/W       |             |
| 協 |                     |          |                    |                                 |                 | Power Efficiency          |                              |             |
|   |                     |          |                    |                                 | Area Efficiency | 31.4 GOPS/mm <sup>2</sup> |                              |             |

< Performance Comparison >

|                                            | [2] ISSCC'12        | [3] ISSCC'13        | [4] 18800145        | This Work                                  |                                                          |  |
|--------------------------------------------|---------------------|---------------------|---------------------|--------------------------------------------|----------------------------------------------------------|--|
|                                            |                     |                     | [4] 15500 15        | Driving-mode                               | Parking-mode                                             |  |
| Function                                   | Object<br>Detection | Object<br>Detection | Object<br>Detection | Object Detection +<br>Intention Prediction | Intention Prediction +<br>Surveillance Record<br>Trigger |  |
| Process                                    | 40nm                | 130nm               | 40nm                | 65nm                                       |                                                          |  |
| Area (mm²)                                 | 45                  | 25                  | 106                 | 1                                          | 16                                                       |  |
| ore Voltage (V)                            | 1.1                 | 1.2                 | 1.1                 | 1.2                                        | 0.8                                                      |  |
| Operating<br>equency (MHz)                 | 266                 | 200                 | 266                 | 250                                        | 20                                                       |  |
| Power (mW)                                 | 749                 | 260                 | 3368                | 330                                        | 0.984                                                    |  |
| Performance<br>(GOPS)                      | 464                 | 271                 | 1900                | 502                                        | 1.80                                                     |  |
| ergy Efficiency<br>(GOPS/W)                | 620                 | 646                 | 564                 | 862                                        | 918                                                      |  |
| area Efficiency<br>(GOPS/mm <sup>2</sup> ) | 10.3                | 10.8                | 17.9                | 31.4                                       | 0.1125                                                   |  |
| tereo Matching                             | Local +<br>Sparse   | x                   | N/A                 | Global + Dense                             | Not Used                                                 |  |
| Intelligence                               | x                   | X                   | X                   | 0                                          | 0                                                        |  |

[2] Y. Tanabe, ISSCC 2012, "A 464GOPS 620GOPS/W heterogeneous multi-core SoC for image-recognition applications"

[4] J. Tanabe, ISSCC 2015, "A 1.9TOPS and 564GOPS/W heterogeneous multicore SoC with color-based object classification accelerator for image-recognition applications

![](_page_0_Picture_55.jpeg)

![](_page_0_Picture_56.jpeg)

![](_page_0_Picture_57.jpeg)

![](_page_0_Picture_58.jpeg)