

### Tutorial Day Monday, August 27, 2012

| 9:30 am         | Tutorial 1                                                     | (The Evolut  | tion of                        | ) Mobile SoC Programming               |  |
|-----------------|----------------------------------------------------------------|--------------|--------------------------------|----------------------------------------|--|
| 9:00 am - 10:40 | Mobile SOCs: Connecting Hardware to Apps                       |              | to                             | Neil Trevett from Khronos              |  |
| 10:40 - 11:10   | BREAK                                                          |              |                                |                                        |  |
| 11:10 – 11:25   | Camera and Video                                               |              |                                | Sean Mao, <b>ArcSoft</b>               |  |
| 11:25 – 11:40   | Vision and Gesture Processing                                  |              |                                | Itay Katz, Eyesight                    |  |
| 11:40 – 11:55   | Augmented Reality                                              |              |                                | Ben Blachnitzky, Metaio                |  |
| 11:55 – 12:10   | Sensor Fusion                                                  |              |                                | Jim Steele, <b>Sensor</b><br>Platforms |  |
| 12:10 – 12:25   | 3D Gaming                                                      |              | Daniel Wexler, the11ers        |                                        |  |
| 12:25 – 1:00 pm | Panel                                                          |              |                                |                                        |  |
| 1:00 – 2:00pm   | Lunch                                                          |              |                                |                                        |  |
| 2:00 pm         | Tutorial 2                                                     | Die Stacking |                                |                                        |  |
| 2:00 – 2:15     | Introduction Li                                                |              | Liam                           | iam Madden, <b>Xilinx</b>              |  |
| 2:15 – 2:40     | Foundry TSV Enablement For 2.5D/3D Chip Stacking               |              | Remi Yu, <b>UMC</b>            |                                        |  |
| 2:40 - 3:05     | Full Processing Interposer Process                             |              | Choo                           | Choon Lee, <b>Amkor</b>                |  |
| 3:05 – 3:30     | Roadmap for Design and EDA<br>Infrastructure for 3D Products   |              | Riko Radojcic, <b>Qualcomm</b> |                                        |  |
| 3:30 – 3:55     | Xilinx SSI Technology: Concept to Silicon Development Overview |              | Shankar Lakka, <b>Xilinx</b>   |                                        |  |
| 3:55 – 4:10     | BREAK                                                          |              |                                |                                        |  |
| 4:10 – 4:35     | Memory Consideration and<br>Heterogeneous Die                  |              | Bryan Black, <b>AMD</b>        |                                        |  |
| 4:35 – 5:00     | Optical Backplanes with 3D<br>Integrated Photonics?            |              | Ephrem Wu, Xilinx              |                                        |  |
| 5:00 – 5:30     | Panel                                                          |              |                                |                                        |  |
| 5:30 – 7:00 pm  | Reception                                                      |              | •                              |                                        |  |





# SOC Programming Tutorial Hot Chips 2012 Neil Trevett Khronos President







# Welcome!

### • An exploration of SOC capabilities from the programmer's perspective

- How is mobile silicon connecting to mobile software?

### • Mobile ecosystems and the programming APIs they provide

- Mobile OS and open standards for SOC acceleration

# SOC innovation hotspots

 Vision and gesture processing, Augmented Reality, Sensor Fusion Photography and Video Processing, 3D Graphics

# Illustrate the state of the art in mobile programming

- AND highlight the issues and challenges still be to solved



# **Speakers**

| Session                                | Speaker         | Company          | Title                                             |
|----------------------------------------|-----------------|------------------|---------------------------------------------------|
| Connecting Mobile SOC Hardware to Apps | Neil Trevett    | Khronos          | Khronos President and NVIDIA VP of Mobile Content |
| Break                                  |                 | Break            |                                                   |
| Camera and Video                       | Sean Mao        | ArcSoft          | VP Marketing, Advanced Imaging Technologies       |
| Vision and Gesture Processing          | Itay Katz       | Eyesight         | Co-Founder & CTO                                  |
| Augmented Reality                      | Ben Blachnitzky | Metaio           | Director of R&D                                   |
| Sensor Fusion                          | Jim Steele      | Sensor Platforms | VP Engineering                                    |
| 3D Gaming                              | Daniel Wexler   | the11ers         | СХО                                               |
| Panel Session                          |                 | All Speakers     |                                                   |



# **Khronos Connects Software to Silicon**

### Khronos APIs define processor acceleration capabilities

- Graphics, video, audio, compute, vision and sensor processing



# **APIs BY the Industry FOR the Industry**

### Khronos APIs define core device acceleration functionality

- Low-level "Foundation" functionality needed on every platform
- Rigorous conformance tests for cross-vendor consistency

### Khronos standards have strong industry momentum

- 100s of man years invested by industry leading experts
- Shipping on billions of devices and multiple operating systems

# • Khronos is OPEN for any company to join and participate

- Standards are truly open one company, one vote
- Solid legal and Intellectual Property framework for industry cooperation
- Khronos membership fees to cover expenses

### Khronos standards are FREE to use

- Members agree to not request royalties



# • Khro





# **API Standards Evolution**





EGI



Diverse platforms – mobile, TV, embedded – means HTML5 will become increasingly important as a universal app platform



S O C N C S O C N

HR

 $\mathbf{X}$ 

# **A New Era in Computing**





# **20 Years Faster to 100M Per Year**



Source: Gartner, Apple, NVIDIA

K H R O N O S



# **The Largest Market Ever**

### • IDC - 1.8 billion mobile phones will ship in 2012

- By the end of 2016, 2.3 billion mobile phones will ship per year



K H R N N S



# **Global Smartphone Market Share**





# **ARM is Licensable and Pervasive**



K H R O N O S



HR

 $\mathbf{X}$ 

# **Mobile Performance Increases**





# **Power is the New Design Limit**

### The Process Fairy keeps bringing more transistors

- Transistors are getting cheaper
- But the Process Fairy isn't helping as much on power as in the past
  - The End of Voltage Scaling

In the Good Old Days

Leakage was not important, and voltage scaled with feature size

$$L' = L/2$$
  
 $V' = V/2$   
 $E' = CV^2 = E/8$   
 $f' = 2f$   
 $D' = 1/L^2 = 4D$   
 $P' = P$ 

Halve L and get 4x the transistors and 8x the capability for

the same powe

**The New Reality** Leakage has limited threshold voltage, largely ending voltage scaling

> L' = L/2 V' = ~V  $E' = CV^2 = E/2$  f' = ~2f  $D' = 1/L^2 = 4D$ P' = 4P

Halve L and get 4x the transistors and 8x the capability for Ax the nowerly

K H RONOS



# **Mobile Thermal Design Point**





Resolution makes a difference! The iPad3 screen takes up to 8W

10" Screen takes 1-2W

30-90W

Max power the system can use and not break down Even as battery technology improves - these thermal limits remain



# **Apps and Power**

- Much more expensive to **MOVE data than COMPUTE data**
- Process improvements WIDEN the gap
  - 10nm process will increase ratio another 4X
- Energy efficiency must be key metric during silicon AND app design
  - Awareness of where data lives, where computation happens, how is it scheduled

0.5pJ





S O N N

2

I

# **Energy Optimization Opportunities**

### Dark Silicon

- Lots of space for transistors just can't turn them all on at same time
- Multiple specialized hardware units that are only turned on when needed
- Increase locality and parallelism of computation to save power

# Dynamic and feedback-driven software power optimization

- Instrumentation for energy-aware compilers and profilers
- Most compilers just look at one thread, take a more global view
- Power optimizing compiler back-end / installers

# Smart, holistic use of sensors and peripherals

- Motion sensors, cameras, networking, GPS





# **Camera Sensor Processing**

# • CPU

- Single processor or Neon SIMD
- Makes heavy use of general memory

### • GPU

- Many way parallelism
- Efficient image caching into general memory
- Programmable and flexible
- Still significant use of cache/memory

### • Camera ISP = Image Signal Processor

- Scan-line-based
- Data flows through compact hardware pipe
- No global memory used to minimize power
- Little or no programmability





# **Typical Camera ISP**

- ~760 math Ops
- ~42K vals = 670Kb
- 300MHz → ~250Gops





ິ

0° 2°

2

I

 $\mathbf{\Sigma}$ 

# **Programmers View of a Typical SOC**





ທີ

**O**<sup>°</sup>

Z

2

Т

 $\mathbf{\Sigma}$ 

# **HSA Feature Roadmap**

Heterogeneous System Architecture Foundation: AMD, ARM, Imagination, TI, MediaTek



Time



# **Mobile Innovation Hot Spots**

### New platform capabilities being driven by SILICON and APIs



K H R O N O S

 $\mathbf{\Sigma}$ 

# **OpenMAX - Media Acceleration**

### Royalty-free , cross-platform open standards





# **OpenMAX AL Streaming Media Framework**

### • Enables key video, image stream and camera use cases

- Enables optimal hardware acceleration with app portability

# Create Media Objects to play and process images and video with AV sync

- Connect to variety of input and output objects to PLAY and RECORD media

# • Full range of video effects and controls

- Including playback rate, post processing, and image manipulation



K H R O N O S



DRM

# **Accelerating Streaming Media**

Video

Augmented

**HD Video** 

### Movies Teleconferencing Editing Reality Inject encrypted Inject video into decode/render Video decoding High frame-rate, low ٠ elementary streams into Extract video from capture/render to texture latency camera capture decrypt/decode/render · Extended controls (fine-grained Texture to video to app and GPU texture **Dynamic format changes** codec query/config, force key Advanced camera encoding Support for Widevine frames) control over format. ROI 30 FPS frame-rate No extraction -Video to so no receiver texture via No dynamic support, no SurfaceTexture. format change. extended No texture to Unencrypted controls video encoding streams only.

### Computational **Photography**

 Fast, low latency camera capture to app and GPU texture Advanced camera control over format. bracketed burst mode with sequenced key/value pairs

Proprietary

extensions

for advanced

camera access





# **OpenCL – Heterogeneous Computing**

- Native framework for programming diverse parallel computing resources
  - CPU, GPU, DSP as well as hardware blocks(!)

### Powerful, low-level flexibility

- Foundational access to compute resources for higher-level engines, frameworks and languages

### Embedded profile

- No need for a separate "ES" spec
- Reduces precision requirements





One code tree can be executed on CPUs or GPUs

S O S O S O S O S

2

I

 $\mathbf{\Sigma}$ 

# **OpenCL Overview**

# C Platform Layer API

- Query, select and initialize compute devices

# Kernel Language Specification

- Subset of ISO C99 with language extensions
- Well-defined numerical accuracy IEEE 754 rounding with specified max error
- Rich set of built-in functions: cross, dot, sin, cos, pow, log ...

# • C Runtime API

- Run-time or build-time compilation of kernels
- Execute compute kernels across multiple devices

# Memory management is explicit

- Application must move data from host → global → local and back
- Implementations can optimize data movement in Unified memory systems





# **OpenCL: Execution Model**

### Kernel

- Basic unit of executable code ~ C function
- Data-parallel or task-parallel

### Program

 Collection of kernels and functions ~ dynamic library with run-time linking

### Command Queue

- Applications queue kernels & data transfers
- Performed in-order or out-of-order

### • Work-item

An execution of a kernel by a processing element
 ~ thread

### • Work-group

 A collection of related work-items that execute on a single compute unit ~ core



### Example of parallelism types

S S

# **Custom Devices and Built-in Kernels**

- Embedded platforms often contain specialized hardware and firmware
  - That cannot support OpenCL C
- Built-in kernels can represent these hardware and firmware capabilities
  - Such as video encode/decode, camera ISP
- Hardware can be integrated and controlled from the OpenCL framework
  - Can enqueue built-in kernels to custom devices alongside OpenCL kernels
- OpenCL becomes a powerful coordinating framework for diverse resources
  - Programmable and non-programmable devices controlled by one run-time



Built-in kernels enable control of specialized processors and hardware from OpenCL run-time

# K H R O N O S



# **OpenCL Milestones**

# Six months from proposal to released OpenCL 1.0 specification

- Due to a strong initial proposal and a shared commercial incentive
- Multiple conformant implementations shipping on desktop
  - For CPUs and GPUs on multiple OS

### 18 month cadence between releases

- Backwards compatibility protects software investment





S S

0° 2°

2

I

SIGGRAPH 2012

S

O°

Z°

2

I

# Adobe at SIGGRAPH 2012

Adobe V OpenCL

# Adobe

- Compute API supported across vendors
- Programming model familiar to C programmers
- Demonstrated performance
- Same compute kernels on CPU and GPU!
- Adobe is now active member of OpenCL working group
  - Contributing Adobe's experience and minds to continue OpenCL evolution

SIGGRAPH - Khronos OpenCL BOF - August 8, 2012

Page 7



# **OpenCL Roadmap**

### **OpenCL-HLM** (High Level Model)

Exploring high-level programming model, unifying host and device execution environments through language syntax for increased usability and broader optimization opportunities



### Long-term Core Roadmap

Exploring enhanced memory and execution model flexibility to catalyze and expose emerging hardware capabilities

Considering ways to exploit unified and shared virtual memory systems where available

### **OpenCL-SPIR** (Standard Parallel Intermediate Representation)

Exploring LLVM-based, low-level Intermediate Representation for code obfuscation/security and to provide target back-end for alternative high-level languages



S O S S S S S S S S S S S

2

I

 $\mathbf{\Sigma}$ 

# **OpenCL as Parallel Compute Foundation**





# **Computational Photography**

- Many advanced photo apps today run on a single CPU
  - Suboptimal performance and power
- OpenCL is platform to harness CPUs/GPUs for advanced imaging
  - Even if code is 'branchy'

"The tablet ... has new multimedia capabilities, including a computational camera, which lets devs tap directly into its computational capability through new application programming interfaces such as OpenCL. That access enables nextgeneration use cases such as light-field cameras for mobile devices."

### 







Flash / no-flash imaging



S O S O S O S O S O S

HRC

 $\mathbf{\Sigma}$ 

# **OpenGL 20th Birthday - Then and Now**

| OpenGL                                 | Ideas in Motion - SGI                                                      |                                                         | <section-header></section-header>                    |  |
|----------------------------------------|----------------------------------------------------------------------------|---------------------------------------------------------|------------------------------------------------------|--|
| OpenGL<br>RUTH ANNIVERSARY   1992-2012 | 1992<br>Reality Engine<br>8 Geometry Engines<br>4 Raster Manager<br>boards | 2012 Mobile<br>NVIDIA Tegra 3<br>Nexus 7 Android Tablet | 2012 PC<br>NVIDIA<br>GeForce GTX 680<br>Kepler GK104 |  |
| Triangles / sec (millions)             | 1                                                                          | 103 (x103)                                              | 1800 (x1800)                                         |  |
| Pixel Fragments / sec (millions)       | 240                                                                        | 1040 (x4.3)                                             | 14,400 (x60)                                         |  |
| GigaFLOPS                              | 0.64                                                                       | 15.6 (x25)                                              | 3090 (x4830)                                         |  |
|                                        | <b>1.5KW</b>                                                               | <b>&lt;5W</b>                                           |                                                      |  |

C V V

I

 $\mathbf{X}$ 



ິ

Z°

2

I

 $\mathbf{\mathbf{\Sigma}}$ 

# **OpenGL 4.3 with Compute Shaders**



<sup>©</sup> Copyright Khronos Group 2012 | Page 37



N S S S S S S

H RC

 $\mathbf{\mathbf{\Sigma}}$ 

# **Compute Shaders**

### • Execute algorithmically general purpose GLSL shaders

- Operate on uniforms, images and textures

### Process graphics data in the context of the graphics pipeline

- Easier than interoperating with a compute API IF processing 'close to the pixel'

### Complementary to OpenCL

- Not a full heterogonous (CPU/GPU) programming framework using full ANSI C

### Standard part of all OpenGL 4.3 implementations

- Matches DirectX 11 functionality











Physics

AI Simulation

**Ray Tracing** 

Imaging

**Global Illumination** 

S O N S O N S

2

I

 $\mathbf{\Sigma}$ 

# **OpenCL and OpenGL Compute Shaders**

### OpenGL compute shaders and OpenCL support distinctly different use cases

- OpenCL provides a significantly more powerful and complete compute solution





# **OpenGL ES**

 Streamlined subset of desktop OpenGL for embedded and mobile devices

ES3 is backward compatible - so new features can be added incrementally



<sup>©</sup> Copyright Khronos Group 2012 | Page 40



# **OpenGL ES 3.0 Highlights**

### Better looking, faster performing games and apps – at lower power

- Incorporates proven features from OpenGL 3.3 / 4.x
- 32-bit integers and floats in shader programs
- NPOT, 3D textures, depth textures, texture arrays
- Multiple Render Targets for deferred rendering, Occlusion Queries
- Instanced Rendering, Transform Feedback ...

### Make life better for the programmer

- Tighter requirements for supported features to reduce implementation variability

### Backward compatible with OpenGL ES 2.0

- OpenGL ES 2.0 apps continue to run unmodified

### Standardized Texture Compression

- #1 developer request!





# **Texture Compression is Key**

### Texture compression saves precious resources

- Saves network bandwidth, device memory space AND memory bandwidth
- Developers need the same texture compression EVERYWHERE
  - Otherwise portable apps such as WebGL need multiple copies of same texture





# ASTC – Future Universal Texture Standard?

### Adaptive Scalable Texture Compression (ASTC)

- Quality significantly exceeds S3TC or PVRTC at same bit rate

### Industry-leading orthogonal compression rate and format flexibility

- 1 to 4 color components: R / RG / RGB / RGBA
- Choice of bit rate: from 8bpp to <1bpp in fine steps
- ASTC is royalty-free and so is available to be universally adopted
   Shipping as extension today for industry feedback





# **Kishonti GLBenchmark 3.0**



Kishonti "GLBenchmark 3.0" preliminary



# **OpenGL ES Deployment in Mobile**



K H R N N O S



# **Visual-based Augmented Reality**





S O S O S O S O S O S

HR

 $\mathbf{X}$ 

# OpenCV

### • Widely used Computer Vision open source project

- Not an API definition, not managed by Khronos

### Extensive functionality

- Used in academia, fast prototyping, some products

# Traditionally runs on a single CPU



- MulticoreWare open source CPU/GPU enabled OpenCV over OpenCL



**OpenCV** 



# **OpenVL**

### Vision Hardware Acceleration Layer

- Enable hardware vendors to implement accelerated imaging and vision algorithms

### Diversity of efficient implementations

- From dedicated hardware pipelines to parallel programmable processors

### Can be used by high-level libraries or applications directly

- Primary focus on enabling real-time vision apps on mobile and embedded systems

### OpenCV will leverage OpenVL for acceleration

- OpenVL does not duplicate OpenCV functionality
  - JUST provides essential acceleration



S O N S O N

2

I

 $\mathbf{\Sigma}$ 

### **Possible Implementation of Vision Stack** Semantics and fusion of StreamInput. Implement camera and positional sensors StreamInput vision sensor modules with OpenCV **High-level** computer vision library Accelerate OpenCV OpenCV OpenMAX AL. library with **OpenVL Functions Camera input from** Data and event Accelerated computer OpenMAX AL or **OpenVL** interop with CL / other camera OpenGL ES GL / ES for display vision algorithms subsystems and compute processing **Use OpenCL** EGL EG to implement **OpenVL** with parallel OpenCL execution **Parallel computation** OpenCL



S O N S O N

2

Т

# **Requested Camera Extensions**

# Query camera information

- Focal length (fx, fy), principal point (cx, cy), skew (s), image resolution (h, w)
- Spatial information of how cameras and sensors are placed on device
- Calibration and lens distortion

# ROI extraction

- From wide angle and fish-eye lenses

# • FCAM++ - Extensive exposure parameters in single or burst mode

- Shutter, aperture, ISO, white balance, frame rate, focus modes, resolution
- Synchronization with other system sensors

# Data output format control

- Grayscale, RGB(A), YUV
- Access to the raw data e.g. Bayer pattern





# **OpenSL ES – Advanced Audio**

### OpenSL ES does for audio what OpenGL ES does for graphics

- Advanced audio functionality from simple playback to full 3D positional audio

# Object-based native audio API for simplicity and high performance

- Same object framework as OpenMAX AL
- Reduces development time

### Attractive alternative to open source frameworks

- Tightly defined specification with full conformance tests
- Robust application portability across platforms and OS



K H R O N O S

2

I

 $\mathbf{\Sigma}$ 

# EGLStream – Video/Graphics Interop





# **Portable Access to Sensor Fusion**



Advanced Sensors Everywhere RGB and depth cameras, multi-axis motion/position, touch and gestures, microphones, wireless controllers, haptics

keyboards, mice, track pads

Apps request semantic sensor information

StreamInput defines possible requests, e.g. "Provide Skeleton Position" "Am I in an elevator?"



### Processing graph provides sensor data stream

Utilizes optimized, smart, sensor middleware Apps can gain 'magical' situational awareness

# **Example use of Khronos APIs in AR**





# **API Adoption**

|  | OpenGL ES. | OpenGL ES 2.0<br>Shipping - Android 2.2                   |  |
|--|------------|-----------------------------------------------------------|--|
|  |            | OpenSL ES 1.0<br>Shipping – Android 2.3                   |  |
|  |            | OpenMAX AL 1.0<br>Shipping - Android 4.0                  |  |
|  | EGL        | EGL 1.4<br>Shipping under SDK -> NDK                      |  |
|  | WebGL      | Chrome will have WebGL.<br>Opera and Firefox WebGL now    |  |
|  | OpenGL     | OpenGL 3.2<br>on MacOS                                    |  |
|  | OpenCL     | OpenCL 1.1<br>on MacOS                                    |  |
|  | OpenGL ES. | OpenGL ES 2.0<br>on iOS                                   |  |
|  | WebGL      | Can enable on MacOS Safari<br>iOS5 enables WebGL for iAds |  |
|  |            |                                                           |  |

### **Mobile Operating Systems**



**Microsoft Windows RT:** 

- Only Microsoft native APIs
- HTML5 but not yet WebGL



S O S S S S S S S S S S S

2

Т

# **Extended Native APIs on Android**

### • Native APIs can be shipped as NDK extensions before Google Adoption

- Do not break/change existing Google APIs

### Khronos open standard APIs have strong momentum in silicon

- Google has choice to adopt into standard platform to eliminate fragmentation
- Exposed directly or wrapped in Java binding

### Extended APIs can be used by:

- Bundled apps, Market apps with API selection
- Multiple APKs behind single multi-APK SKU





# HTML5 – Cross OS App Platform

- Increasing diversity of devices creates a demand for a true cross OS programming platform
- BUT need more than "more HTML"



# Image: set in the set in

How can the Browser rapidly assimilate such diverse functionality?

Traditional Web-content



S O N S O S S S S S S S S S

2

I

 $\mathbf{\mathbf{\Sigma}}$ 

# Leveraging Proven Native APIs into HTML5

### Leverage native API investments into the Web

- Faster API development and deployment
- Familiar foundation reduces developer learning curve

### Khronos and W3C creating close liaison

- Multiple potential joint projects



HTML

W3C<sup>®</sup>



# WebGL – 3D on the Web – No Plug-in!

- Historic opportunity to bring accelerated 3D graphics to web
  - WebGL defines JavaScript binding to OpenGL ES 2.0
- Leveraging HTML 5 and uses <canvas> element
  - Enables a 3D context for the canvas
- Low-level foundational API for accessing the GPU in HTML5
  - Flexibility and direct GPU access support higher-level frameworks and middleware
- WebGL 1.0 Released at GDC March 2011
  - Mozilla, Apple, Google and Opera working closely with GPU vendors



# **WebGL Implementation Anatomy**



# WebGL – Being Used by Millions Every Day

S

S O N N

2

I

 $\checkmark$ 





# **WebGL and Security**

### WebGL is Architecturally Secure

- NO known WebGL security issues
- Impossible to access out-of-bounds or uninitialized memory
- Use of cross-origin images are blocked without permission through CORS
- Browsers maintaining black lists used if unavoidable GPU driver bugs discovered

# DoS attacks and GPU hardening

- Draw commands can run for a long time -> unresponsive system
  - Even without loops in shaders
- WebGL working closely with GPU vendors to categorically fix this
- Short term: mandate ARB\_robustness and associated GPU watchdog timer
- Longer term: GPUs need robust context switch and pre-emption





# WebCL – Parallel Computing for the Web

### JavaScript bindings to OpenCL APIs

- JavaScript initiates OpenCL C Kernels on heterogeneous multicore CPU/GPU
- Stays close to the OpenCL standard
  - Maximum flexibility to provide a foundation for higher-level middleware
- Minimal language modifications for 100% security and app portability
  - E.g. Mapping of CL memory objects into host memory space is not supported
- Compelling use cases
  - Physics engines for WebGL games, image and video editing in browser
- API definition underway public draft just released
  - https://cvs.khronos.org/svn/repos/registry/trunk/public/webcl/spec/latest/index.html





# WebCL Demo

http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc

# WebCL for Hardware-Accelerated Web Applications

Advanced Browser Technology Samsung R&D Center San Jose, CA



S,

2

I

# Web Apps versus Native Apps

### Mobile Apps have functional and aesthetic appeal

- Beautiful, responsive, focused

# HTML5 with GPU acceleration can provide the same level of "App Appeal"

- Highly interactive, rich visual design

# Using HTML5 to create 'Web Apps' has many advantages

- Web app is searchable and discoverable through the web
- Portable to any browser enabled system
- Same code can run as app or as web page
- Not a closed app store no app store 'tax'

### How soon will we be able to write apps such as AR in HTML5?



60%

50%

0%

20% (



2

I

 $\mathbf{Y}$ 

# **Expanding Platform Reach for Graphics and Computation**

Production Browsers Shipping with WebGL: Desktop - Chrome, Firefox, Opera, Safari Mobile - Opera and Firefox Apple iOS Safari uses WebGL for iAds





# Cross-OS Portability

VebGL

**Objective C** 

GL ES

**NebGL** 

Dalvik

(Java)

GL ES

HTML5 provides cross platform portability. GPU accessibility through WebGL available soon on ~90% mobile systems

C# Preferred development environments not designed for portability

**No WebGL** 

**DirectX** 

Native code is portablebut apps must cope with different available APIs and libraries

© Copyright Khronos Group 2012 | Page 67

K H R S S S

5

**SDK** 

C/C++



# Summary

- Advances in SOC silicon processing are enabling significant new use cases
- Architectural shifts, such as unified memory, are creating challenges and opportunities for applications and the APIs that enable them
- Holistic cooperation between hardware and software needed to deliver increasing computational loads in a fixed power budget
- Dynamic tension between platform vendors that want captive apps and developers that benefit from cross platform portability
- Mobile operating systems lag in exposing the latest SOC capabilities which creates functional differentiation opportunities
- Cooperative API standards eliminate roadblocks to mobile industry growth





# **Thank You!**

# See you back at 11:10AM







# ArcSoft Multi-Frame Technologies Hot Chips 2012 Sean Mao, VP Marketing Advanced Imaging Technologies





ArcSoft

S O S S O S S S S

HR

 $\mathbf{X}$ 

# **ArcSoft Overview**

- Founded in 1994
- HQ in Fremont, CA
- 900+ employees worldwide
- Photo & Video software
- Markets served:
  - Mobile, tablet, PC, DSC





ິ

O°

Z°

2

I

 $\mathbf{\Sigma}$ 

## **Industry Leaders Choose ArcSoft**



Nikon Canon Panasonic. OLYMPUS Di FUJIFILM

PENTAX EPSON Kodak FUITSU TOSHIBA



S O S O S O S O S O S

HR

## **ArcSoft Imaging Technologies**

- Adopted by major camera phone vendors and digital camera vendors
- Leverages various forms of hardware capabilities
  - CPU, GPU, DSP, ISP, H/W Codec, Fast RAM, DMA, and other specialized hardware
  - Highly efficient ISP

#### Boosts value to the end product

- Better capture quality and speed performance
- Better user experience
- Differentiation









S O S O S O S O S O

HR

 $\mathbf{Y}$ 

## ArcSoft Multi-Frame Technology Overview

- Takes advantage of high speed burst capability in latest capture SOC
- Combines multiple image frames to achieve improved image quality and better experience





S O S O S O S O S O S

2

I

## **ArcSoft Multi-Frame Technology Portfolio**

- Multi-Frame Night Shot
- Panorama BurstCapture
- Multi-Frame Anti-Shaking
- High Speed HDR
- PiClear (auto object removal)
- PicBest (optimal portrait composition)
- More...







## **Achieve Better Quality With Multi-Frame**

- Brighter
- Clearer
- Better resolution

**COMPARING WITH SONY DSC** 

**TX-10** 





ທ<sub>ີ</sub>

0° 2°

2

Т

## **Hardware Requirements**

- Fast burst capture capability (>15 FPS)
- Continuous burst to more than 300 shots and stop at any time

#### For each capture in the burst

- H/W noise filter enabled
- Output in YUV formats
- Capability to do bracketing
  - e.g. void setEvBracketCapture(float[]); // In NvCamera
- Capability to change the capture parameters (3A, ISO, Gain, etc)
  - e.g. void void setExposureTime(int); // In NvCamera
- Capability to lock the capture parameters

#### e.g.

void setAutoExposureLock(boolean lock); // In NvCamera
void setAutoWhiteBalanceLock(boolean lock); // In NvCamera



S O N N

2

I

## **Challenges and Expectations**

- Quality of the each frame in the burst is not well-tuned sometime, especially when captured with non-3A parameters
- Image frames sometime are not consistent even with all capture parameters locked
- Need downsized image frames passed from ISP
- Need Bayer RAW output
- Need hardware-based math functions especially for matrix operations
  - e.g. add, subtract, multiply, absolute difference, division, max, min, linear gradual blend, etc..













#### **Touch-free technology** Itay Katz Founder and CTO









#### **Overview**



eyeSight's groundbreaking Touch Free technology provides an enhanced user experience, allowing to easily and intuitively control a variety of devices using simple hand gestures.

eyeSight's Natural User Interface solution utilizes the device's standard 2D camera, along with advanced real-time image processing and machine vision algorithms, to track the user's hand gestures and convert them into actions.



## Components of eyeSight's Gesture recognition technology:

- Directional gestures detection
- Real-time hand / fingertip tracking
- Face detection and tracking (multiple users)
- Hand signs; "OK", "Like" etc.





# Implementation Challenges of such technologies:

- Very high processing requirements
- Memory Throughput
- Real-time performance





N O S N O S

HR

### **Solutions: Managing performance**

- Algorithm- the key tool for minimizing performance
- Target specific optimizations
  - Instrinsics using special instructions
  - ISA Extensions: x86 SSE, ARM NEON
  - Assembly
  - GPGPU OpenCL







#### Solutions: Coping with memory throughput

- Image processing algorithms require reading/writing the video many times, placing a lot of load on the memory subsystem.
   Reading a 720p YUV422 streams just once requires 55 Mbytes/Sec.
- DMA not usable in most systems, since there are no tightly-coupled memories.
- It is key to minimize the number of passes on the video data.
- **Designing for Cache** 
  - Minimize data footprint
  - Locality
  - Automatic pre-fetching
  - Software pre-fetching



#### Solution: Handling real-time performance

- Most systems are non-deterministic: Caches, other processes, O/S behavior
- Algorithms require more processing when there's "action" in the video
- This makes the instantaneous performance requirements vary greatly
- As a result, it is not possible to guarantee hard real-time performance
- Instead, our solution is design to be soft real-time, and to handle realtime violations
- One challenge is the lack of standardized high-precision timers to allow software to monitor execution time





### **Platform portability - Challenges:**

#### Challenges:

- Maintaining a single code base across multiple products, OS types and compilers
- Stay up to date with recent OS version releases
- Maintain backward compatibility
- Maintaining pixel portability



#### **Platform portability - Solution:**

#### <u>Solution</u>:

- Working with automated tools for each core code change
- Maintain strict coding conventions to avoid using platform specific libraries
- Use cross platforms frameworks such as OpenCL
- Work with a configurable device abstraction layer to keep platform specific code from migrating into different platforms or devices
- Using a framework developed in eyeSight we can convert any video stream, any resolution or format (including IR), to a single representation that serves our algorithms

💥 eyeSigh



#### Thank you.

# www.eyeSight-tech.com







#### AUGMENTED SOLUTIONS

#### **Augmented Reality** Hot Chips 2012 **Ben Blachnitzky Director R&D, Metaio**





#### metaio

- Founded 2003 out of computer vision research
- Privately held, independently financed
- 500+ B2B customers worldwide
- 85+ people working in Munich (HQ) and San Francisco
- 10,000+ active developers worldwide
- Extensive R&D department with 100+ patents across 38 different families
- 200+ mobile apps running on metaio technology





K H RO

N S S



2

I

 $\mathbf{X}$ 

### From Hardware to Software to End User



## **General Milestones**

- Oct. 2011: Won ISMAR tracking competition with pure mobile marker-less tracking
- Dec. 2011: mobile SDK 3.0 release
  - Advanced visual tracking of 2D and 3D objects
  - Gravity aware AR
  - Optimized AR-pipeline for major mobile chipsets (ARM, ST-Ericsson, Texas Instruments)
- February/March 2012: <u>Augmented City</u> Platform at MWC and SDK 3.1
  - Advanced 3D object tracking
  - Visual search technology
  - Further hardware optimization on AR-pipeline

#### • Estimated Q4 2012: mobile SDK 4.0





## **Achievements and Challenges**

- From marker-tracking to full 3D Object Tracking on mobile in only 14 months
- Low level optimizations:
  - Leveraging SIMD extensions (e.g. NEON) for certain computer vision tasks
  - Memory constraints and optimizations especially for large client based visual search
  - General pipeline optimizations and leveraging modern multi-core architectures
  - Optimized camera access for e.g. high-resolution visualization images and lower resolution tracking images
- **Challenges:** Battery consumption & extending the limits of tracking in AR so that it becomes natural for consumers





## **High-level Overview of Mobile SDK**

#### Android Developer Perspective

- Java IF for straightforward application development
- Computational-intensive operations in NDK
- JNI provides language interoperability







## **Multiple Object AR**

- More than 40 instances of client-based real-time image detection made possible through specific optimized algorithms
- Activation and recognition speed comparable to Quick Response (QR)



<u>http://www.youtube.com/watch?v=QQ8HNXtl7jQ&feature=player\_embedded</u> <u>http://creativity-online.com/news/mccannerickson-gives-new-ikea-catalog-a-vitamin-pill/236165</u>





## **Useful AR Optimizations**







#### http://www.youtube.com/watch?v=DMg4UUCaQdw

- 100+ images detected and recognized
- Same algorithms and technology are freely available to mobile developer ecosystem

# K H R N N O S

S

2

Т

 $\mathbf{\mathbf{\Sigma}}$ 

## **AR Enhanced Print Media (Mobile AR)**

PERFORMANCE SA Packers go over th Powering "AR-Lite" cloud-based experiences that should

http://www.youtube.com/weltderwunderadsales#p/f/3/KZS1Q3I0a9Q

Süddeutsche 7eitun

- be accessible in nearly all connectivity environments
- Current adopters: The Atlantic, Axel Springer, Burda, Süddeutscher Verlag, USA Today
- +10 million copies of AR enhanced magazines are powered by junaio cloud infrastructure

## **API | Hardware Wish List**

- One Computer Vision hardware abstraction layer for Android and iOS
- Native/Advanced camera access across different platforms:
  - Visualization image and tracking image
  - Hardware optimized image pre-computations/conversions
  - Full control of camera parameters such as shutter, brightness, (auto-) focus
  - Very fast texture upload to renderer for camera image
  - Platform independent parallelization approaches (SIMD and multi-core)
  - HW implementation for most important AR functions to address the battery consumption issue









#### www.metaio.com



@twitt\_AR

#### facebook.com/metaio

#### augmentedblog.wordpress.com



SENSOR PLATFORMS



#### Sensor Fusion Mobile Platform Challenges and Future Directions Jim Steele VP of Engineering, Sensor Platforms, Inc.



## **How Many Sensors are in a Smartphone?**



- Light
- Proximity
- 2 cameras
- 3 microphones (ultrasound)
- Touch
- Position
  - GPS
  - WiFi (fingerprint)
  - Cellular (tri-lateration)
  - NFC, Bluetooth (beacons)
- Accelerometer
- Magnetometer
- Gyroscope
- Pressure
- Temperature
- Humidity



## **Mobile Sensor Challenges**



~90° compass error in the first Ice Cream Sandwich smartphone

- Underlying problems:
- Some sensor components lack repeatability
- RF and other PCB noise interaction with mag sensor
- Non-standard availability (no gyro, pressure, 2nd camera, ...)
- Non-standard capability (resolution, update rate, ...)
- Not fully specified (non-uniform gain, skew)



## **User Experience Across Platforms**

- Heavy engineering burden to maintain consistency across system variations
  - Hard and soft iron contents
  - Component selection for optimal price/performance
  - Different application processors (sensor hubs)
  - Different mobile OS

#### Validation efforts diffused over multiple platforms







#### **Sensor Fusion Algorithms Solve Challenges**



K H R O N O S



## **Examples of Sensor Fusion**

#### 10-axis sensor fusion and background calibration

- Industry standard foundation for sensors
- No user-intervention to keep sensors calibrated
- Adjusts to changes in environment

#### Sensor data can be interpreted using algorithms

- Magnetometer  $\rightarrow$  Compass (avoid magnetic anomalies)
- Pressure  $\rightarrow$  Altitude (avoid pressure anomalies)
- Throttle the gyroscope (keep highest power sensor off until needed)

#### Combine multiple sensors to improve sensing

- Pressure + GPS = faster GPS fix
- Camera + Sensor Fusion = Augmented Reality
- inElevator sensor?



## **Comparison of Mobile OS Sensor Support**

| Sensor                      | iOS 5            | Android             | Win8         |  |
|-----------------------------|------------------|---------------------|--------------|--|
| Accel/Mag/Gyro              | $\checkmark$     | $\checkmark$        | $\checkmark$ |  |
| Pressure/Humidity           | x                | $\checkmark$        | x            |  |
| Quaternion                  | CMAttitude       | ROTATION_VECTOR     | Orientation  |  |
| <b>Euler Angles</b>         | CMAttitude       | ORIENTATION (depr.) | Inclinometer |  |
| Dynamic<br>Acceleration     | userAcceleration | LINEAR_ACCELERATION | Shake        |  |
| Gravity in body<br>frame    | gravity          | GRAVITY             | Tilt         |  |
| Tilt-compensated<br>Compass | X                | X                   | Compass      |  |
| In Elevator                 | x                | x                   | x            |  |

Virtual Sensors, e.g.



Measured acceleration



## **Need Unified Timestamp between Sensors**

- Sensors work on different time bases that drift
- Not all sensors support the same sampling rate



K H R O N O S

S O S O S O S O S

2

I

## **StreamInput Concepts**

#### Standardized Application-defined filtering and conversion

- Can create virtual input devices

#### • Sensor Hardware Vendor Agility for OEMs

- Allows standardized interface for hardware accelerated features

#### Extensibility to any sensor type

- Can define new node data types, state and methods

#### Sensor Synchronization

- Universal time stamp on every sample



## StreamInput Architecture



- 1. Setup Processing Graph (or use pre-supplied graph), request and receive semantic sensor stream through Highlevel API
- 2. Optionally, dynamically configure sensor processing through Lowlevel API – can tune power vs. performance

Implementable over existing OS input APIs to simplify adoption



S O N S O N

2

Т

## **Sensor Platforms**

• We create algorithms for sensors

- More information at our blog: <u>www.sensorplatforms.com</u>
- jsteele@sensorplatforms.com









## **Platform Performance** Dan Wexler, CxO The 11<sup>ers</sup>





N O S

K H R

## CPU



# AGP PCIe



GPU









N O S S O S

HR

 $\mathbf{X}$ 



#### "Leading edge graphics in tasty mobile bytes."

#### Small Scope → Low Risk → Experience





S O N S O S S S S S S S S S

HR

 $\mathbf{X}$ 

# \$5B & 650K

## Top 1% get 36% (\$270K) Next 19% get 61% (\$25K) Last 80% get 3% (\$290)

51 titles *before* Angry Birds



\* http://DaveAddey.com/?p=893, 7/26/12









HR

 $\mathbf{X}$ 

## **Dynamic, Dependent, Reusable**







N S O S O S O

2

I

 $\mathbf{X}$ 



Enderton & Wexler, "The Workflow Scale: Why 5x Faster Might Not Be Enough", CGI 2011



## What's Shared Memory?

- No copies (OpenGL API forces copies)
- Simplified synchronization (map/unmap, fence, cache flush)
- Shared virtual address space (segmented ok)
- Texture instead of attributes? (Why VAR failed?) Must be able to dynamically generate geometry on GPU and CPU (© GLES3)
- Gosh, it would be really nice if pointers Just Worked ™
- Shared memory uses less power & less bandwidth









## Leapfrog Opportunity

- We tried and failed for 10 years to move more algorithms to the GPU due to the lack of shared memory and the cost of bandwidth.
- Mobile UMA eliminates the bottleneck, fix the APIs and apps will follow.
- Shared memory is *lower power* → inevitable.
- Shared memory is *transformative* → whole new classes of apps.

Thanks to Cass Everitt, Eric Enderton

#### AFTERNOON TUTORIAL

### **Die Stacking**

|             |                                                  | 0                       |
|-------------|--------------------------------------------------|-------------------------|
| 2:00 – 2:15 | 3-D Stacking Tutorial Introduction               | Liam Madden, Xilinx     |
| 2:15 – 3:05 | Foundry TSV Enablement                           | Remi Yu, <b>UMC</b>     |
|             | For 2.5D/3D Chip Stacking                        |                         |
|             | Full Processing Interposer Process               | Choon Lee, Amkor        |
| 3:05 – 3:55 | Roadmap for Design and EDA                       | Riko Radojcic, Qualcomm |
|             | Infrastructure for 3D Products                   |                         |
|             | Xilinx SSI Technology                            | Shankar Lakka, Xilinx   |
|             | Concept to Silicon Development<br>Overview       |                         |
| 3:55 – 4:10 | Break                                            |                         |
| 4:10 – 5:00 | Memory Consideration and<br>Heterogeneous Die    | Bryan Black, <b>AMD</b> |
|             | Optical Backplanes with 3D Integrated Photonics? | Ephrem Wu, Xilinx       |
| 5:00 – 5:30 | Panel                                            |                         |
| 5:30 - 7:00 |                                                  |                         |
|             | Reception                                        |                         |
|             | •                                                |                         |

Copyright © 2012 HOTCHIPS. All rights reserved. All trademarks property of their respective owners.





#### Die Stacking

2.5D/3D die stacking increases aggregate inter-chip bandwidth and shrinks board footprint while reducing I/O latency and energy consumption. By integrating in one package multiple tightly-coupled semiconductor dice – each possibly in a process optimized for power, performance and costs for a particular function – this technology gives system designers additional options to partition and scale solutions efficiently. Die stacking has already transformed the design of high-end CMOS image sensors, and it promises to also enhance FPGA, graphics and mobile applications.

In Part 1 of this tutorial we will examine the key enabling technologies such as silicon interposer, TSV, micro-bump and assembly integration. In Part 2 we will cover the design considerations & trade-offs of 2.5D/3D in CAD, ESD and architecture. Part 3 will showcase how the technology is used in systems and applications for memory integration, optics integration and monolithic die partitioning. Schedule

2:00pm – 2:15pm: Introduction by Liam Madden from Xilinx
2:15pm – 3:05pm: Technology Fab, Interposer and TSV by Remi Yu from UMC Assembly and Micro Bumps by Choon Lee from Amkor
3:05pm – 3:55pm: Design Considerations 3D, CAD and Floorplanning by Riko Radojcic from Qualcomm 2.5D, CAD and ESD by Shankar Lakka from Xilinx
3:55pm – 4:10pm: Break for refreshments
4:10pm – 5:00pm: System Implications Memory Consideration and Heterogeneous Die by Bryan Black from AMD Optical Considerations by Ephrem Wu from Xilinx

5:00pm - 5:30pm: Panel moderated by Liam featuring all the speakers

Copyright © 2012 HOTCHIPS. All rights reserved. All trademarks property of their respective owners.



August 22 to 24, 2010 Memorial Auditorium, Stanford University A Symposium on High Performance Chips Sponsored by the IEEE Technical Committe on Microprocessors and Microcomputers



#### **3-D Stacking Tutorial** Introduction



Liam Madden Corporate Vice President Xilinx Aug 27<sup>th</sup> 2012



#### **Dedicated to the Memory of Chuck Moore: Visionary**



Chuck Moore, AMD Corporate Fellow 1961-2012

#### Agenda

- Introduction: Liam Madden, Corp VP, Xilinx (2:00-2:15)
- Technology: (2:15-3:05)
  - Foundry: Remi Yu, Director Marketing, UMC
  - OSAT: ChoonHeung Lee, Corp VP, Amkor
- Design Considerations: (3:05-3:55)
  - Mobile Communications: Riko Radojcic, Director, Qualcomm
  - FPGA: Shankar Lakka, Director Integration, Xilinx
- Break (3:55-4:10)
- System Implications (4:10-5:00)
  - Processor and GPU: Bryan Black, Senior Fellow, AMD
  - Integrated Optics, Ephrem Wu, Senior Director, Xilinx
- 5:00-5.30 Panel Discussion

#### **Cost Comparison: Monolithic vs Multi-Die**

"Moore's Law is really about economics" Gordon Moore



**Die Area** 

#### Why is first 3D logic product an FPGA?



- Natural partition using "long lines"
- Very low "opportunity cost"
- No 3<sup>rd</sup> party dependence
- "Size matters" to customers
- Compelling value proposition "next generation density in this generation technology"

#### **Virtex 2000T: Homogeneous Stacked Silicon Interconnect Technology (SSIT)**



**Elements of SSIT** 



#### **3 Decades of Microprocessor Integration: A personal history**

"Integrate or be integrated" Fred Webber, former CTO AMD

| Year | Company   | Product | Integration Level |      |     |       |      |                 |     |  |
|------|-----------|---------|-------------------|------|-----|-------|------|-----------------|-----|--|
|      |           |         |                   | Core |     |       | L2\$ | North<br>Bridge | GPU |  |
|      |           |         |                   | DP   | Ctl | L1 \$ | FPU  |                 |     |  |
| 1983 | Harris    | J11     | 4um               |      |     |       |      |                 |     |  |
| 1989 | DEC       | Rigel   | 1.5um             |      |     |       |      |                 |     |  |
| 1991 | DEC       | Alpha   | 0.75um            |      |     |       |      |                 |     |  |
| 2005 | Microsoft | Xenon   | 90nm              |      | 3 ( | Core  |      |                 |     |  |
| 2011 | AMD       | Fusion  | 40nm              |      | 2 ( | Core  |      |                 |     |  |



What happened to System on a Chip?

|                               | Logic                            | Memory                                  | Analog                            |
|-------------------------------|----------------------------------|-----------------------------------------|-----------------------------------|
| Global Revenue 2011           | \$150B                           | \$68B                                   | \$45B                             |
| Moore Scaling                 | Good (except I/O)                | Good (except I/O)                       | Poor                              |
| Technology "Vintage"          | 2012                             | 2012                                    | 2000                              |
| Transistor<br>Characteristics | High performance/<br>Low leakage | Low leakage/<br>moderate<br>performance | Stable with good voltage headroom |
| Metallization                 | >9 layers                        | <5 layers                               | <6 layers                         |
| Differentiators               | High density logic               | Charge storage                          | Passives, Optical                 |

#### **Crossing the packaging chasm**



#### 7V580T – Dual FPGA Slice with 8x28Gb/s SerDes Die



#### Virtex-7 HT @ 28Gbps



## Foundry TSV Enablement For 2.5D/3D Chip Stacking

Remi Yu, UMC Hot Chips 24 August 27, 2012



**Customer-Driven Foundry Solutions** 

## Outline

2.5D/3D Applications
Foundry TSV Enablement
Ecosystem Work Flow
Summary



## 2.5D/3D Applications



**Customer-Driven Foundry Solutions** 

## 2.5D Si Interposer Stacking



## Logic/logic: FPGA, networking infrastructure Logic/memory: Gaming, HPC



### **3D Logic/Memory Stacking** - Via-Middle TSV 28nm Logic + Memory Cube



### Mobile WidelO, Computing WidelO, HMC



# **Application Examples**

### - More are being developed



(1) http://low-powerdesign.com/sleibson/2011/10/25/

generation-jumping-2-5d-xilinx-virtex-7-2000t-fpga-delivers-1954560-logic-cells-consumes-only-20w/

- (2) http://eda360insider.wordpress.com/2012/06/01/
- friday-video-3d-thursday-xilinx-virtex-7-h580t-uses-3d-assembly-to-merge-28gbps-xceivers-fpga-fabric/
- (3) eSilicon, "GSA 3D Working Group", July 2012
- (4) http://www.ecnmag.com/news/2011/03/samsung-wide-io-memory-mobile-products-deeper-look
- (5) http://denalimemoryreport.com/2012/06/28/arm-hp-and-sk-hynix-join-hybrid-memory-cube-consortium-hmcc-first-spec-due-by-end-of-year/

### UMC

# **Cost-of-Ownership Advantages**

#### **Motivations:**

Higher BW, lower W/BW, smaller form-factor

### **Opportunity of return on 3D IC investment:**

- Chip process node optimization
  - Homogeneous partition
  - Cross-node combinations
- BOM cost optimization
  - Less demanding substrate/PCB, lighter cooling assembly, ...
  - Ultimately: better product, better margin



#### Xilinx Virtex 7 (1)



 (1) http://low-powerdesign.com/sleibson/2011/10/25/ generation-jumping-2-5d-xilinx-virtex-7-2000t-fpga-delivers-1954560-logic-cells-consumes-only-20w/
 (2) http://www.i-micronews.com/news/Micron-Samsung-TSV-stacked-memory-collaboration-closer-look,7766.html

### UMC

## **Foundry TSV Enablement**



# Foundry TSV Process Technology



### Mainstream: Via-middle Cu TSV

- 2.5D: 65nm-generation BEOL
- 3D: 28nm CMOS logic

After 28nm entry, TSV for 3D may come as a standard option for foundry CMOS logic at 20nm and beyond

### UMC

# - Via-Middle TSV for 3D



(drawn not to scale)

### TSV formed after CMOS, before contact/metal



### UMC 28nm 3D IC



(CMOS device and TSV in proportion)

### UMC

### **UMC Via-Middle TSV Unit Process**



Leveraging existing CMOS tools and capability
Size is new to fab practice: diameter/depth

### UMC

### Early Stage TSV Process Issues



### TSV integrity – Cu fill, oxide liner, metal stack



## **ECP Cu Fill Process Optimization**

### - Cu pumping reduction





### ECP Cu plating critical to TSV integrity



### **UMC Via-Middle TSV Solution**



#### UMC

### **TSV Top Side Impact Evaluation** - BEOL WAT testkeys

|                         |       |                       | Above TSV           | Cross TSV           | Beside TSV                | After Sinter          |  |
|-------------------------|-------|-----------------------|---------------------|---------------------|---------------------------|-----------------------|--|
| WAT<br>test item        | Layer | Testkey rule<br>(w/s) | TSK<br>Metz<br>Metz | Met2<br>Het1<br>Tsv |                           |                       |  |
| Metal<br>Bridge         | M3    | 1x W /1 x S           | Passed              | Passed              | Passed (min. x=1)         | No significant        |  |
|                         | IVIS  | 2x W /2x S            | Passed              | Passed              | n/a                       |                       |  |
|                         | M4    | 1xW/1xS               | Passed              | Passed              | Passed (min. x=1)         | change                |  |
|                         |       | 2x W /2x S            | Passed              | Passed              | n/a                       |                       |  |
| Metal<br>Resistan<br>ce | M3    | 1xW/1xS               | Passed              | Passed              | Comparable for variable x |                       |  |
|                         |       | 2x W /2x S            | Passed              | Passed              | n/a                       | No significant change |  |
|                         | M4    | 1x W /1 x S           | Passed              | Passed              | Comparable for variable x |                       |  |
|                         | W 4   | 2x W /2x S            | Passed              | Passed              | n/a                       |                       |  |
| Via<br>Bridge           | V3    | Com_run Via           | Passed              | Passed              | Comparable for variable x | No significant        |  |
|                         | V3    | Non_comVia            | Passed              | Passed              | Comparable for variable x | change                |  |

Routing over TSV allowed



**Customer-Driven Foundry Solutions** 

**W**U

### **TSV Bottom Side Impact Evaluation** - Leakage CDF





# **CMOS Impact Evaluation (3D)**

- Keep-Out Zone (KOZ) Characterization



### UMC

# Via Middle TSV (3D)

- 6um diameter, 54um depth



### UMC

## Si Interposer TSV (2.5D)

- 10um diameter, 100um depth



### UMC

# - 2.5D Si Interposer, 10x100um



#### UMC

# **UMC 3D IC TV Stacking & Package**



#### **JEDEC WidelO interface**





#### UMC

# **Ecosystem Work Flow**



# **Example 2.5D Stacking Flow**



### UMC

### Various Work Models

|      |                 | FEOL    |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | MEOL           |                  | BEOL     |                         |  |  |
|------|-----------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|------------------|----------|-------------------------|--|--|
|      |                 | Logic   | TSV + FS RD'.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | Wafer Thinning | BS RDL +<br>Bump | Assembly | Test                    |  |  |
|      | OSAT MEOL       |         | Foundry                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                | OS               | АT       |                         |  |  |
| 2.5D | Foundry MEOL    |         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Foundry        |                  | OSAT     |                         |  |  |
|      | Foundry Turnkey |         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                | Foundry          |          |                         |  |  |
|      | OSAT MEOL       | Foundry |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | OS₽T           |                  |          |                         |  |  |
| 3D   | Foundry MEOL    |         | Fou                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | ndry           |                  | OSAT     |                         |  |  |
|      | Foundry Turnkey |         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Fou            | ndry             |          |                         |  |  |
|      |                 |         | and the second se |                |                  | 1000     | CONTRACTOR OF THE OWNER |  |  |

- Service scopes distinguished by MEOL inclusion
  - Consult your foundry/OSAT

Work flow optimization may depend on BOM cost, stack recipe and test strategy



# Foundry TSV Design Collaterals

| Scheme                             | Feature Size               |                     |                                        |                         | Document Status          |                           |                       |                     |                     |
|------------------------------------|----------------------------|---------------------|----------------------------------------|-------------------------|--------------------------|---------------------------|-----------------------|---------------------|---------------------|
|                                    | TSV<br>CD/Depth<br>(um/um) | RDL<br>Cu<br>Layers | Cu<br>Thickness<br>/L /S<br>(um/um/um) | Al<br>Thickness<br>(um) | Topologic<br>Layout Rule | Electrical<br>Design Rule | Interconnect<br>Model | DRC<br>Command File | LVS Command<br>File |
| 1.0um-wide                         | 10/100                     | 0.13 Inductor       | 2.0<br>/1.00/1.00                      | 2.50                    | Ready                    | Ready                     | Ready                 | Ready               | Ready               |
| 0.4um-wide                         | 10/100                     | 55nm 4X             | 0.8<br>/0.40/0.40                      | 1.45                    | Ready                    | Ready                     | Ready                 | Ready               | Ready               |
| 0.56um-wide                        | 10/100                     | 65nm 6X             | 1.25<br>/0.56/0.56                     | 3.60                    | Ready                    | Ready                     | Ready                 | Ready               | Ready               |
| (UMC 2.5D Si interposer documents) |                            |                     |                                        |                         |                          |                           |                       |                     |                     |

### Consider TSV a passive device with rule decks/models

• Typical foundry engagement applies under ecosystem work flow

## **UMC Ecosystem Effort**





# Summary



# Summary

### Foundry TSV process demonstrated

- Applicable to both 2.5D/3D
- Leverage existing CMOS process technology
- Key process issues identified & conquered
- Ecosystem work flow
  - Typical foundry/OSAT engagement flow applies for both 2.5D/3D, among other models
- Foundry TSV next step: ecosystem focus
  - Product level reliability assessment
  - Potential EDA collaboration for emerging 3D tools





## Thank you for your attention!

Contact lound Quind con or into@hour be dig.









### Full Processing Interposer Process

Hot Chips, Aug., 2012



Enabling a Microelectronic World<sup>®</sup>

#### **Chip on Interposer First Process**





© 2012 Amkor Technology, Inc.

© Copyright Amkor

#### **Chip on Interposer First Process**





© 2012 Amkor Technology, Inc.

© Copyright Amkor

### Chip on Interposer First Process – High Level Risk 🕢

### **1** Front Micro Bump Pad

- Ni/Au Pad : Shape, Thickness, IMC embrittlement

# 2 Chip Attach & CUF : Chip Attach alignment, Flux cleaning, Underfill dispensing

**3** Wafer Mold : Warpage, Void

### Flat Reveal Wafer Thinning + CMP

– Wafer Cracking, Cu smearing, Cleaning

### 5 Silicon Recess – Dry Etch (CF4)

Cu corrosion, Etch rate variance, Slow Etch, Contaminate

### 6 Passivation – organic pass. coating, PECVD

– Wafer Cracking, Edge Arcing, Thickness/Stress control

### **7** Secondary Reveal – CMP : Wafer Cracking

**8** C4 Bumping

- 9 Mold Thinning (optional)
- 10 Dicing Saw street cracking

4

#### **Chip on Interposer Last Process**



#### **Chip in interposer Last Process**





### Chip on Interposer Last Process – High Level Risk

#### **1** Front Micro Bump Pad

- Ni/Au Pad : Shape, Thickness, IMC embrittlement

### 2 Zone Bond : TTV Control

#### **3** Flat Reveal Wafer Thinning + CMP

– Wafer Cracking, Cu smearing, Cleaning

### 4 Silicon Recess – Dry Etch (CF4)

- Cu corrosion, Etch rate variance, Slow Etch, Contaminate

### **5** Passivation – Organic pass. coating, PECVD

– Wafer Cracking, Edge Arcing, Thickness/Stress control

### 6 Secondary Reveal

Wafer Cracking

### 7 C4 Bumping

**8** 2<sup>nd</sup> Carrier Bonding & 1<sup>st</sup> Carrier De-bondding

**9** Chip Attach on Interposer

### **10** 2<sup>nd</sup> Carrier de-bonding

7



#### **TSV - Interposer M/BEOL Process**





#### © Copyright Amkor



### **1** Front Micro Bump Pad

- Ni/Au Pad : Thickness, IMC embrittlement

### 2 Zone Bond

TTV Control

### **3** Flat Reveal Wafer Thinning + CMP

- Wafer Cracking, Cu smearing

### 4 Silicon Recess – Dry Etch (CF4)

- Cu corrosion, Etch rate variance, Slow Etch, Contaminate

### **5** Passivation – Organic pass. coating, PECVD

– Wafer Cracking, Edge Arcing, Thickness/Stress control

### 6 Secondary Reveal

– Wafer Cracking

### **7** C4 Bumping

### 8 Carrier De-bondding

- Wafer Breakage

## 1. FS NiAu – CoC Evaluation



### CoC Evaluation on E-lytic Ni/Au

- AOI inspection
- Ni/Au thickness measurements
- Auger analysis for surface condition
- Wafer bonding
- Simulated backside thermal processes
- Debond
- AOI Pad inspection for FM
- Singulate
- Mass Reflow
- TC Bond
- "FA X-section, EDX line scan, EDX area mapping"
- TC CoC
- FA



#### **Images - Post UBM Etch Process**

- There are no abnormalities





## Process validation

- For edge trimming to reduce chipping.
- Optimization of wafer bonding to minimize thickness variance of temporary bonding adhesive.
- Minimizing wafer crack on debonding process.
- EAR optimization

## **Overview of ZoneBOND carrier wafer**





- Silane+FC40 (Z1, release zone)
  - This is anti-sticky zone.
- Edge zone (Z2, stiction zone)
  - Edge zone width is approximately 2.5mm.
  - Minimum edge zone width is 1.5 mm.
  - SU8 is used as the edge zone mask.



#### Drop test to Acetone





- We can confirm that Zone treated carrier wafer(Z1) to acetone.
- Z1 is non stick. The reaction of the material to the wafer is just to make the material chemically bond to the wafer that as a "Silanol condensation reaction". Once it reacts with the surface, the single molecule layer that's permanently attached to the carrier acts as a poly tetrafluoroethylene(PTFE) or "Teflon like" coating on the wafer.

## **ZoneBOND De-bonding**



Edge Zone Release with EZR & EZD module





EZR Module: 300mm wafer mounted on film frame

## Failures & Problems Related with ZoneBOND De-Bonding





→Delamination

 $\rightarrow$ Adhesive squeeze out





 $\rightarrow$ Crack at edge zone

→Crack

→Wafer shift

© 2012 Amkor Technology, Inc.



|                   | Thermal                    | Zone                    | Laser               | Chemical | Wedge     |
|-------------------|----------------------------|-------------------------|---------------------|----------|-----------|
| Machine           | EVG, TEL, SUSS             | EVG, SUSS               | TAZMO, Yushin, SUSS | ТОК      | SUSS      |
| Material          | BSI, ShinEtsu,<br>Sumitomo | BSI, ShinEtsu, Sumitomo | 3M                  | ТОК      | TMAT, Dow |
| Machine<br>price  | Middle                     | High                    | Middle              | Middle   | High      |
| Material<br>price |                            | High                    | Middle              | High     | Middle    |
| тти               | Good                       | Normal                  | Good                | Normal   | Normal    |
| UPH               | Middle                     | Low                     | High                | Low      | High      |

The miracles of science HD-3007





SUSS XBC300

brewer ZoneBOND™



© 2012 Amkor Technology, Inc.

## Advantage & Disadvantage of Various Methods



|                 |                  | POR                                                                                                                | NEW                                                                                                                        |                                                                                                                          |                                                                                                                           |                                                                                                                             |
|-----------------|------------------|--------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------|
| System          |                  | Thermal                                                                                                            | Zone                                                                                                                       | Laser                                                                                                                    | Chemical                                                                                                                  | Wedge                                                                                                                       |
| Bond            | Advantage        | - Using the Si carrier                                                                                             | - Using the Si carrier                                                                                                     | <ul> <li>Using the UV cure</li> <li>Low out gassing</li> <li>Double side bond</li> </ul>                                 | <ul> <li>1 layer adhesive<br/>coat</li> <li>Short bonding time</li> <li>Good to adhesive<br/>generality</li> </ul>        | <ul> <li>Using the Si carrier</li> <li>Development of an active adhesive</li> </ul>                                         |
|                 | Dis<br>advantage | - Long Bonding time                                                                                                | <ul> <li>Application of Zone carrier</li> <li>Bad to adhesive stability</li> <li>High machine price</li> </ul>             | <ul> <li>Using the Glass carrier</li> <li>Bad to adhesive generality</li> </ul>                                          | <ul> <li>Using the hole Glass<br/>carrier</li> <li>Coating the top<br/>device</li> <li>High carrier price</li> </ul>      | <ul> <li>2 layer coat</li> <li>Difficult to control<br/>adhesion</li> <li>High machine<br/>price</li> </ul>                 |
| Bump<br>process | Advantage        | - Applicable issue to the Si carrier                                                                               | - Applicable issue to the Si carrier                                                                                       | - High stability<br>(thermal, chemical)                                                                                  | <ul> <li>Advantage of out gassing</li> <li>High chemical stability</li> </ul>                                             | - Applicable issue to the Si carrier                                                                                        |
|                 | Dis<br>advantage | <ul> <li>Bad thermal<br/>stability</li> <li>High adhesive<br/>contamination</li> </ul>                             | <ul> <li>Bad thermal<br/>stability</li> <li>Change the<br/>adhesive</li> <li>Weak to void</li> </ul>                       | <ul> <li>Glass chucking</li> <li>Weak to void</li> </ul>                                                                 | <ul> <li>Glass chucking</li> <li>Bad thermal stability</li> <li>Process failure by<br/>high warp</li> </ul>               | <ul> <li>Low adhesion</li> <li>Concern to Si del.</li> <li>Weak to void</li> <li>High adhesion<br/>contamination</li> </ul> |
| Debond          | Advantage        | - No mount tape<br>damage                                                                                          | <ul> <li>Room temperature<br/>debond</li> <li>High thermal<br/>stability</li> </ul>                                        | <ul> <li>Room temperature<br/>debond</li> </ul>                                                                          | - High thermal stability                                                                                                  | <ul> <li>Room temperature<br/>debond</li> <li>Carrier remove to<br/>short time</li> <li>High thermal stability</li> </ul>   |
|                 | Dis<br>advantage | <ul> <li>Need to high<br/>temperature<br/>process</li> <li>Bump damage</li> <li>Thin wafer<br/>handling</li> </ul> | <ul> <li>Long remove time<br/>to edge adhesion</li> <li>Worry about new<br/>process</li> <li>High machine price</li> </ul> | <ul> <li>Possibility to laser<br/>damage</li> <li>Difficult to rework</li> <li>Adhesion change<br/>at surface</li> </ul> | <ul> <li>Long time of<br/>adhesion removal</li> <li>After removing the<br/>adhesion, possibility<br/>of damage</li> </ul> | <ul> <li>Wafer edge<br/>damage</li> <li>High machine<br/>price</li> </ul>                                                   |

© 2012 Amkor Technology, Inc.



## Process validation

- Soft reveal
  - Minimizing TTV with accurate control.
  - Cleaning improvement after wet polish.
- Flat process
  - Only grinding of Si layer at WBG tool not to expose Cu.
  - Using CMP tool to expose Cu and post CMP cleaning



© 2012 Amkor Technology, Inc.

## **Dry Etch**



## Process validation

- Soft reveal
  - Acceptable etch rate
  - Optimizing etch rate and uniformity with TSV bonded pairs.
  - Finding via height for ISR process sequence.
- Flat process
  - Very slow etch rate
  - Optimizing etch rate and uniformity with TSV bonded pairs
  - Etch gas mixing evaluation to improve etch rate without Cu corrosion.

## Si recess etching : Dry etch



## Si recess etching : Dry etch



#### Flat process



#### Soft reveal process



© 2012 Amkor Technology, Inc.





## Process validation

- Deposition of SiN and SiO2
- Confirming deposition rate, uniformity, stress and RI.
- Setting up measurement method using elipsometer to check single layer, multi layer.





- Silicon Nitride
  - SiH<sub>4</sub>(g) + NH<sub>3</sub>(g) + N<sub>2</sub>(g) → Si<sub>x</sub>N<sub>y</sub>H<sub>z</sub>(s) + H<sub>2</sub>(g)
- Silicon Oxide [Silane-based process]
  - SiH<sub>4</sub>(g) + 4N<sub>2</sub>O(g) + N<sub>2</sub> (g) → SiO<sub>2</sub>(s) + 4N<sub>2</sub>(g) + H<sub>2</sub> (g)+ O<sub>2</sub>(g)
- Silicon Oxide [TEOS-based process]
  - Si(OC<sub>2</sub>H<sub>5</sub>)<sub>4</sub>(g) + O<sub>2</sub>(g)  $\rightarrow$  SiO<sub>2</sub>(s) + byproducts





NoteFound no abnormality.





© 2012 Amkor Technology, Inc.

© Copyright Amkor

3D TSV





Note
• Found no damage.





Note

Found no damage.

© 2012 Amkor Technology, Inc.

© Copyright Amkor

3D TSV





## Process validation

- Process optimization to find BKM
  - Oxide/Cu polish process for ISR (Inorganic soft reveal)
  - Si/Cu polish process for flat reveal process
  - Slurry evaluation
  - Post CMP cleaning evaluation





Proven product and industry benchmark CMP tool >1500 Reflexion/Refelxion LK shipped by 2011

© 2012 Amkor Technology, Inc.

© Copyright Amkor

#### Product Features

- 3 platen 4 head polisher
- Multi zone polishing head
- In-situ process control optimizes productivity and performance
- High performance Desica Cleaner

#### Process Controls

- Real-Time Profile Control (RTPC<sup>™</sup>)
  - High-resolution eddy-current endpoint for bulk metal polish step
- FullScan<sup>™</sup> Endpoint
  - Laser endpoint for metal film clearing
- FullVision<sup>™</sup> MX Endpoint
  - Broad-band optical endpoint control for remaining dielectric thickness
- Si EP/ISPC under development
- Process BKMs
  - TSV CMP know-how
  - Low cost/high performance process BKMs for various TSV CMP applications





 1k nitride and 2.8um oxide were deposited on these wafers, pillar height at wafer center and edge post etch are not high enough for CMP to fully exposed copper after pillar planarization and OP with 5k oxide removal on the field.





Pre-CMP (tilted)

Post 30s Polish Time (tilted) Post 90s Polish Time (top-down)

- Fast and good pillar planarization achieved
- Minimum field oxide loss during pillar planaraization







middle



edge







#### Comkor Technology®

## Secondary reveal : CMP Post CMP Topography



## Secondary reveal : CMP Layer Thickness confirmation after CMP





Patterned Area Next to Via

**Open Field Area** 



## Process validation

- Performing SD with wafer frame handling.
- Evaluation of laser transparent tape.
- Parameter DOE of laser process
- Study for auto focus through inorganic passivation

## **Stealth Dicing**



## – ML300 FH

- Frame handling system : full automatic.
- Same system with current K4 tool, except for handler.
- For TSV product.





## Wafer surface flatness comparison

✓ Observed wafer BG tape laminated by manual process



Result

Wafer flatness was much improved after optimize tape laminate condition. With improved condition, can expect stable Auto Focus result = stable cutting quality.

## **Stealth Dicing**



## Investigation for too low SFV on SiN layered wafer





#### Result

SFV was very low at all the point on wafer.

It was around "0.1V" at the lowest case.

\*SFV is same as quantity of reflected AF laser from wafer surface. \*Enough high SFV (=enough reflectivity) is required to keep good Z-accuracy

## **Stealth Dicing**







qctconnect.com

UALCOMM CDMA Technologies QUALCOMM CONFIDENTIAL AND PROPRIETARY





# **Roadmap for Design and EDA** Infrastructure for 3D Products

| Riko Radojcic                      | HotChips 2012 |
|------------------------------------|---------------|
| Qualcomm                           | Cupertino, CA |
| E-mail : <u>rikor@qualcomm.com</u> | Aug 2012      |
| Tel : 1 858 651 7235               |               |

## Some of the Typical 3D Options

| 2.5D                            | Side by side die stacked<br>on a passive <u>interposer</u><br>that includes TSVs |  |
|---------------------------------|----------------------------------------------------------------------------------|--|
| <b>3D</b><br>Memory             | Multiple DRAM die<br>stacked standalone or on<br>an active interposer            |  |
| <b>3D</b><br>Memory<br>on Logic | One or More DRAM die<br>stacked directly on logic<br>die ( <u>M-0-L</u> )        |  |
| <b>3D</b><br>Logic on<br>Logic  | Multiple logic die stacked<br>on top of each other<br>( <u>L-o-L</u> )           |  |
| <b>3D +</b><br>Interposer       | Mix of side by side and<br>stacked schemes with a<br>passive or active interpsr  |  |



## **Evolving to "Mainstream" 3D Technologies**

PAGE 3





#### Snapshot of Intrinsic Technology Status

|         | Was (common concern a few years ago)                      | ls (our take)      |
|---------|-----------------------------------------------------------|--------------------|
| Process | High aspect ratio (10:1) 5/50 TSV process                 | ✓                  |
|         | Thinning & Backside wafer processing                      | $\checkmark$       |
|         | Microbump and Joining                                     | ✓                  |
|         | Integration & Stacking                                    | $\checkmark$       |
|         | Intrinsic Reliability Assessment                          | 🔹 in flight        |
|         | Standards (JEDEC, SEMI, Sematech, 3D EC,)                 | 🔹 in flight        |
|         | EDA tools (for "2D-like" Memory-on-Logic design)          |                    |
| Desim   | Design Enablement (for "2D-like" Memory-on-Logic design)  | $\checkmark$       |
| Design  | Testability (for "2D-like" Memory-on-Logic design)        | $\checkmark$       |
| (M-o-L) | Variability (Corner for "2D-like" Memory-on-Logic design) | $\checkmark$       |
|         | Standards (JEDEC, Si2, IEEE)                              | 🔹 in flight        |
|         | System Level Value Proposition                            | $\checkmark$       |
|         | Thermal Modeling & Design for Thermal                     | 🔹 in flight        |
| Product | Stress Modeling & Design for Stress                       | $\checkmark$       |
|         | SI modeling & Design for Parametric Yield                 | ✓ in flight        |
|         | Cost Structure & Business Models                          | ● <sup>™</sup> TBD |
|         | Yield and Yield Learning                                  | ● <b>TBD</b>       |
|         | Volume Manufacturing Ramp                                 | ● <b>TBD</b>       |



**REDEFINING MOBILITY** 

#### **Eco-System for 3D Design**

- Segment Design Eco-System into 3 Buckets to Address 3 Key Challenges
- Design Authoring actual chip design
  - Implement Design via (mostly) Traditional 2D Chip Design Flow (RTL2GDS))
  - Output GDS
- PathFinding design/technology concept exploration
  - Manage Choices via Cheap, Quick & Dirty Concept Design
  - Output Clean Specs
- TechTuning physical space exploration
  - *Manage Interactions* via Cheap, Electrical, Thermal & Mechanical Chip Simulation
  - Output Clean Constraints



CDMA Technologies

# PathFinding: Why & What ?

- Managing Choices ....
  - Want to optimize product attributes
  - Cost, power, performance, engineering ...
- Need to Co-Optimize Process & Design
  - Winning 3D Product will Be Architected specifically to Leverage 3D Technology
  - Selection of choices is Product Specific
- In General: <u>Need Spatial Awareness</u>
  - Quick and flexible
  - Hi fidelity vis-a-vis accuracy
- For 3D : <u>Also Need Heterogeneity</u>
  - Multiple stacking styles & orientations
  - Multiple tech files
  - Multiple levels of hierarchy
  - Multiple resource constraints
- <u>Structured</u> Methodology.
  - Past experience not applicable
  - Opportunity for paradigm shifts
  - Not tied to Legacy design
  - Process-Design-Package co-optimization

Details :3D System Integration, Springer 2011

imec









CDMA Technologies

# **3D PathFinding : Current View**



## **PathFinding**

- Level 1 (Atrenta): think
  - RTL & Netlists
  - Block Level Schematics
  - Partitions
  - Block assignments
  - T2T connectivity
  - Global Routing
  - Floorplans
- Level 2 (MicroMagic): think
  - Transistor Level Schematics

m6 v56 m5 v45

m4 v34 m3 v23 m2 v12

ml ct mmi15jactive poly ndif pdif

mmi15|othe nw od2 n3v

pplus

- T2T layout
- SPICE Netlist
- Waveforms
- Polygons





-18.60pS, 1.22

**CDMA** Technologies

# TechTuning: Why & What ?

- Managing Interactions
  - Intimate Proximity and Coupling Between Die
  - In Electrical, Thermal & Mechanical Domains
- Electrical Domain Interactions
  - Within Die Interactions with New Features

     Substrate noise, Coupling etc..
  - Die to Die interactions (SI, PDN, PI...)
- Thermal Domain Interactions
  - Within a Die & Die to Die
  - Need Thermal Rules & Guidelines
    - Design Specific & Technology Specific
    - Need a methodology to plug into std design flow
- Stress Domain Interactions
  - Within a Die & Die to Die
  - Need Stress Rules & Guidelines
    - Design Specific & Technology Specific
    - $-\ensuremath{\,\text{Need}}$  a methodology to plug into std design flow

Details :3D IC Stacking Technology, McGraw Hill 2011









## **3D Electrical Interactions**

- Many Possible Interactions
  - Die to Die close proximity
  - Within a Die new features
- New Geometries: not just simply planar
  - uBump to BRDL
  - TSV to BRDL
  - TSV to TSV
  - TSV to M1



FC Bump

- New Features: not just conductor or insulator
  - MOS nature of TSV & Semiconductor nature of Si
  - e.g. Substrate Noise Coupling: TSV to Device
    - vs. substrate thickness
    - vs. Doping Profile in the Si substrate
    - vs. TSV to Device Separation
    - vs. Substrate Tap & Guard Ring Configuration
    - etc...

cādence

#### Need true 3D Chip Level Extraction & Coupling Analyses

Or a restricted layout with pre-characterized macro model

E-System Design, Inc.

## Thermal Challenges => a Fundamental Constraint

- Thermal: a Global (=System Level) & Local (=Component Level) Challenge
  - Global Concern : must manage skin temperature and overall system power
  - Local Concern : must manage hot spots, junction temperature, and power density
  - Compounding Factor: all advanced systems use some form of Thermal Mitigation
- Thermal is not a 3D-only Challenge
  - A Problem that has to be addressed with 2D Components as well...
  - At Architecture, Design, Floorplanning, Packaging, Application, Software ...
  - Could be a 3D Opportunity ?



#### Need a System-Chip Co-Design Methodology & Tools

Faster and More Flexible than the traditional CFD / FEA methodologies

DOCEA

- Compatible with cross company handshake (a la TDP practice in PC domain)
- Compatible with fuzzy PathFinding-like forward looking inputs
- Compatible with different system level 'knobs'
- Compatible with different chip level 'knobs'

cādence

ritable analysis



## **Implementation of a TechTuning Flow for Stress**

- Interface to actual Design Authoring : <u>Rules</u> now
  - maybe in-flow model based simulation later..
  - Based on 'off-line' simulations using specialized tools
  - Define a 'Safe Operating Area' => a set of rules
  - Supplement with a smart 'hot spot' checker to close the loop



#### "Hot Spot" Checker

- Validation that bits and pieces fit & SIGN OFF the design
- Must interface to design environment : I/P : GDS2 , LEF, DEF  $\ldots$
- May have to be COMPACT MODEL Based (read the whole design and include all effects)
- Working with MENT









**Specialized** 

Simulation

✓ Submodeling &

## Managing Costs : What Does It Mean for TSS Design?

- Expect Gradual and Graceful Evolution
  - Process and Design together / in synch
  - Significant investment in the existing flow
  - Will be Applications Driven
- Now : Heterogeneous Stacking
  - e.g. Memory (or Std Analog) on logic
  - Design Methodology Requirements
    - Partitioning : by die types w/ spec interface
    - Syntheses e-at-a-time
    - Floorplannin constraint from the other die
    - Physical Design : partial 2-sided die (maybe)
    - Physical Verification: 1-die-at-a-time + interface
    - Analyses : whole stack (eg PDN)
- Next : Integrated Stack Designs
  - e.g. Logic-on-Logic or Interposers
  - Design Methodology Requirements
    - Integrated PD Co-Design w Interposer & Substrate
    - Design Constraint Methodology
    - Design Authoring including the Package
    - Manufacturability (aka TechTuning)



## Evolving from 2D Design



# **3D PDN Design Flow**

- 2D Ref Flow
  - Sign off in time domain (Apache)
  - Analyses in frequency domain (Sigrity)
- 3D PDN Flow Approach
  - Take as much as possible from ref flow
  - ✓ Similar approach as Si-Package-PCB Analyses
    - Extract each tier separately
    - Model as an integrated stack
  - Upgraded tools to understand new features
    - TSV, uBump, BRDL, Tier n ...
- Current Status

Apache

- Demonstrated Tools & Flow
- Supporting development of standard Compact PDN Models and associated 3D Design Exchange Format Standards

SIGRIT





#### **Inventory of Current Core Design Technologies**

|                                                        | PathFinding                                                                                         | TechTuning                                                                                                                                                  | Design Authoring                                                                                                                               |
|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
| Things<br>We Do<br>Have                                | <ul> <li>✓ 3D Floorplanner</li> <li>✓ 3D Net generator</li> <li>✓ PDN resource estimator</li> </ul> | <ul> <li>✓ Package Stress simulator</li> <li>✓ Feature Stress simulator</li> <li>✓ Reference Thermal sim.</li> </ul>                                        | <ul> <li>✓ 2D design flow &amp; tools</li> <li>✓ Timing with a fixed</li> <li>TSV/uBump layout</li> <li>✓ 3D aware PI / SI analyses</li> </ul> |
| Things<br>We are<br>Working                            | <ul> <li>Package PathFinder</li> <li>System PathFinder</li> <li>Standard 3D design</li> </ul>       | <ul> <li>Chip Level Stress Sim</li> <li>Chip thermal floorplanner</li> <li>Standard 3D design</li> </ul>                                                    | <ul> <li>M-o-L product design</li> <li>3D Variability Flow</li> <li>Standard 3D design</li> </ul>                                              |
| On                                                     | exchange formats                                                                                    | exchange format & PDK                                                                                                                                       | exchange formats                                                                                                                               |
| Things<br>we do<br>NOT<br>Have<br>(and wish<br>we did) | Technology PathFinder                                                                               | <ul> <li>3D in flow substrate<br/>coupling analyse</li> <li>Fully supported<br/>TechTuning "PDK'</li> <li>System component<br/>thermal co-design</li> </ul> | <ul> <li>TBD Logic on Logic</li> <li>TBD Interposer</li> <li>TBD 3D Extraction</li> <li>TBD 3D ++ (see below)</li> </ul>                       |

■ We don't have Everything – but we do have much more than Nothing ☺ !!





## Standards : a Lubricant for the Supply Chain

#### Leverage Existing Standards Bodies

- Established balloting, adoption and management practices
- But formal and hence need 'mature proposals'....

#### Process Standards

- 3D Enablement Center
- Sematech
- SEMI ...
- Design Standards
  - Si2
  - 3D EC / SRC
  - IEEE
  - JEDEC...

REDEFINING MOBILITY

 Encourage Participation by the Industry – esp EDA





#### 2.5D / 3D Stacking Roadmap



PAGE 17

#### **Design Environment for Memory-on-Logic**

| Status          | Arena                            | Item                                                                                                                                                                                     |
|-----------------|----------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Have            | Design                           | <ul> <li>✓ 2D design flow &amp; tools</li> <li>+ quasi-manual placement of T2T / TSV array</li> <li>+ custom T2T buffer design &amp; incremental rules to manage interactions</li> </ul> |
|                 | Timing                           | <ul> <li>✓ 2.5D analyses flow &amp; tools</li> <li>+ compound 'lumped' TSV delay model</li> </ul>                                                                                        |
|                 | PI                               | <ul> <li>2D analyses flow and tools</li> <li>+ extended hierarchy + recognition of new features</li> </ul>                                                                               |
|                 | SI & Variability                 | ✓ 'Off Line' analyses to produce set of 'keep out' rules                                                                                                                                 |
| In Flight       | 'In Line' Rule<br>Checkers       | <ul> <li>✓ Chip Level Stress Simulator – for 'stress Hot Spots'</li> <li>✓ Chip Level Thermal Floorplanner</li> <li>✓ Chip Level SI Simulator</li> </ul>                                 |
|                 | Integration w/<br>Commercial Die | ✓3D Design Exchange Formats                                                                                                                                                              |
| e to            | SI Analyses                      | In Flow SI analyses – on line and in product flow                                                                                                                                        |
| Like to<br>Have | TechTuning & PathFinding         | <ul> <li>Fully supported TechTuning "PDK'</li> <li>System-Component thermal co-design</li> </ul>                                                                                         |





**REDEFINING MOBILITY** 

#### **Design Environment for Interposers**

| Status       | Arena            | Item                                                                 |
|--------------|------------------|----------------------------------------------------------------------|
| Have         | Design           | <ul> <li>✓ 2D Layout tools</li> <li>✓ 2D Extraction Tools</li> </ul> |
| e ve         | Extraction       | The straction inc. TSV , routing and FRDL/BRDL                       |
| Need to Have | Signal Integrity | Integrated SI tools inc floating substrate and 3D features           |
| eed t        | Power Integrity  | Integrated PI analyses tools & flow                                  |
| Ž            | DFT / Test       | Integrated Double Sided Passive Floating Substrate                   |
|              | PathFinding      | Architectural Trade Off Analyses for Value Proposition               |



## **Design Environment for Logic on Logic**

| Status    | Arena             | Item                                                         |
|-----------|-------------------|--------------------------------------------------------------|
| Have      | Design            | ✓ 2D Flow for One Single Sided Die & Technology at a time    |
|           | PathFinding       | ✓ 3D Physical PathFinding Flow for finding Value Proposition |
| ave       | Floorplan         | 3D with optimization across multiple tiers (technologies)    |
|           | Utility Insertion |                                                              |
|           | Extraction        | 3D Extraction inc. TSV , routing and FRDL/BRDL               |
|           | Timing            | across multiple tiers, technologies, libraries               |
| Must Have | Power Integrity   | Integrated PI analyses tools & flow                          |
| Muŝ       | Signal Integrity  | in flow SI analyses tools inc 3D features                    |
|           | DFT / Test        | Optimized DFT overhead for pre-stack test                    |
|           | Verification      | 3D Physical Verification, LVS, etc across multiple tiers     |
|           | etc               | dependent on the actual stack partition                      |





**REDEFINING MOBILITY** 

qctconnect.com

**UALCOMM** 

CDMA Technologies

QUALCOMM CONFIDENTIAL AND PROPRIETARY





#### REDEFINING MOBILITY QUALOW





# Xilinx SSI Technology Concept to Silicon Development Overview

Shankar Lakka

Aug 27<sup>th</sup>, 2012



#### **>** Economic Drivers and Technical Challenges

#### > Xilinx SSI Technology, Power, Performance

#### **>**SSI Development Overview





# **Market Dynamics**

Video Driving Explosive Growth in Traffic By 2015 2/3 of Mobile Traffic will be Video

**Machine to Machine** 

 Smart Meters, Security Cameras, Health Care, Telematics, etc.

#### Mobile Data Traffic



#### 2x Bandwidth growth every 3 years at the SAME POWER BUDGET

Source: Cisco Global Mobile Traffic, January 2011

XILINX > ALL PROGRAMMABLE.

## **On Die IOs Not Scaling**

#### Number of I/Os per 1000 logic cells in the largest FPGA in each family



#### Logic doubles with Moore's Law but I/O quantity does not

# **The Progression of 3D Technology**



Page 5

© Copyright 2012 Xilinx Inc.

XILINX > ALL PROGRAMMABLE.

# **Capacity : Beyond Moore's Law**



✓ Greater capacity, faster yield ramp

Reference: Node N to N+1  $\sim$  1.5 to 2 years for Xilinx FPGAs



## Well published technology boiler plate



# Cross-section of 28nm FPGA with SSI

Virtex-7 2000T



#### 28nm Active Die + 65 nm Passive Interposer

Courtesy of Xilinx, TSMC, Amkor

#### Interposer /Package Technology

| Technology | Specs                            |
|------------|----------------------------------|
| M1-M4      | 2um pitch 4 4X layers            |
| TSV        | 12um diameter & 180um pitch      |
| Micro-bump | 45um pitch                       |
| C4         | 180um pitch                      |
| Package    | 6-2-6 Layer, 1.0 mm<br>BGA pitch |

- Low risk approach to integrate TSV & u-bump
- High density micro-bump for ~50K chip-to-chip connections
- Better FPGA low-k stress management with silicon interposer

#### Heterogeneous FPGAs with SSI Virtex-7 HT



# **2.5D Performance vs. Monolithic**





Vehicle 1 - Monolithic packaged 28Gbps Serdes



drawn to scale



- Vehicle 2 2.5D packaged 28Gbps Serdes
- Measurements show 2.5D comparable performance to Monolithic die

#### Reduced noise and better performance margin with SSI

# 2.5D / SSI Takeaways

#### SSI Technology summary

- Capacity beyond Moore's law, Faster yield curve
- Breaks die to die IO bottleneck
- Heterogeneous SOCs
- Power advantage

- Stepping stone to true 3D



# **EXILINX**ALL PROGRAMMABLEM

# **SSI Implementation Overview**



# **Challenges for 3D design and validation**

#### > What areas of design validation and sign off are challenging for 3D and why?

- Circuit Design and Schematic capture
- RTL, Physical Design of Top Die and Interposer
- Extraction
- Functional and Physical Verification (LVS, DRC)
- Chip level functional verification
- STA
- IR/EM/SI or other Electrical analysis
- Assembly and Yield (beyond the scope of this presentation)



# **Analysis and Sign off**

State of EDA tools

- 1. Can the analysis be split into hierarchical independent levels?
- 2. Can the data for analysis be split into hierarchical independent levels



# **SSI Full chip**



# **Circuit and Physical Design**

#### Circuit Design and Schematic capture

- Electrical modeling of Interposer
- Design of driver and receiver
- HSPICE Simulations with process models from multiple process nodes
- Signal Integrity analysis
- ESD considerations

#### > Physical Design, Extraction and Verification

- Manual vs. Auto-routed Interposer
- Like Top Die (e.g. all FPGAs) vs. unlike Top die (FPGAs, w/ 28G SerDes)
- Extraction of Interposer layout
- LVS done on each Top die and multiple die together;
- DRC done on Interposer and Top Die separately

# **Static Timing Analysis**

#### > STA

- Black box ILM generated for each die
- Full chip netlist (w/ extracted Interposer) generated using internal scripts
- Special consideration given to the process distribution



# **EM/IR and Thermal Analysis: Challenges**



© Copyright 2012 Xilinx Inc.

EM/IR by budgeting

Accurate EM/IR and thermal analysis is iterative in stacked Die Scenario.

# **3D: The next frontier**

# Higher performance chip stacked on top

- Thermal considerations
- Bottom die includes power TSV's for top die
  - Can be in older, "TSV-friendly" technology

# Floor-planning is critical:

- Thermal concerns (stacked thermal flux)
- TSV keep out zones may be required in bottom die

to avoid stress-induced performance impact



## Assembly Technology still evolving





🗶 XII INX 🔰 ALL PROGRAMMABLE.

# **Call-to-Action: Develop & Evolve 3D Standards**

## Design enablement

- Interposer Models
- EDA Tools for 3D development and verification
- Chip-to-chip interface standards

# > Manufacturing standards

- DFM rules for TSV, microbump
- Materials: TSV, u-bump
- Thermal budget

# > Test

- Test HW & u-bump probing
- Known-Good-Die method
- Self Test Required (FPGA programmability is an advantage)



Bandwidth, power and cost demands are beginning to present significant challenges for monolithic silicon

# > Stacked Silicon Interconnect is a breakthrough!

- Capacity
- Connectivity
- Power, Performance
- Heterogeneous SOCs

# SSI technology is the next big step in IC evolution

# Stacked Silicon Interconnect: A World of Difference





## Earth

Area: ~500 Million km<sup>2</sup> Population: ~6.8 Billion People Oceans: 5 Age: 5 Billion Years

## Virtex-7 2000T

Interposer Area: ~775 mm<sup>2</sup> Population: ~6.8 Billion Transistors Chips: 5 Age: 40 weeks

© Copyright 2012 Xilinx Inc.



## **DIE STACKING AND THE SYSTEM** August 27<sup>th</sup>, 2012

## SYSTEM TRENDS IN THE INDUSTRY



## **Increasing Performance Density**

- The industry is driving performance density
- Improvements in performance density are driving new form factors
- New form factors discover new usage models
- Without new usage models the industry stagnates

## MOORE'S LAW ENABLES SI INTEGRATION DRIVING NEW FORM FACTORS



## THE OBVIOUS ADVANTAGE OF INTEGRATION



Communication is overhead costing power, latency, and footprint

- Interface power is proportional to bandwidth and the link RCs
- And... BW is limited by the off die interface which doesn't scale fast
- But... off die BW demand increases with transistor density



## SI INTEGRATION IS RUNNING OUT OF GAS!

- Moore's Law will continue but there is a limitation
  - All similar technology components have been integrated such as Cache, FPU, Multi-Media, NB, GPU, SB, etc...
  - Only disparate technologies such as DRAM, SSD, IVR are left





## SI INTEGRATION IS RUNNING OUT OF GAS!

- Moore's Law will continue but there is a problem
  - Process scaling is going to stop supporting diverse functionalities on a single die such as fast logic, low power logic, analog, and cache
  - The single die will want to break into specialized components to maximize the value of new and existing process nodes



AMD

## TWO TYPES OF DIE STACKING

 Very similar technologies that reduce metal interconnect and improve proximity of disparate technologies allowing new levels of integration and process specialization





## **DIE STACKING MOTIVATION (MEMORY INTEGRATION)**



AMD

System power is fixed in all platforms

- Compute performance in all platforms is proportional to memory BW
- Memory BW power increases with demand

## IMPROVING BW/WATT WITH DIE STACKING



AMD

- Die stacking helps improve the proximity of the DRAM to Compute
- Dense and fine pitch interconnect enables simple low power interfaces as well as fine grain power control of the DRAM

#### DIE STACKING MOTIVATION (MEMORY INTEGRATION)



AMD

 Dramatically improved memory BW/W rolls back the impact of recent memory system power growth and can help provide years of future scaling

## DIE STACKING MOTIVATION (LARGE DIE YIELD IS GETTING WORSE)



AMD

- Process complexity is increasing and yield is dropping as mask count increases
- Large die sizes will continue to have yield challenges

## INTERPOSER WILL BE THE SOC WITH MULTIPLE 3D COMPONENTS



- Focus process node development on specific application functionalities
  - Reduce complexity and mask layer count
  - Reduce process node TTM
  - Reduce wafer runtime
  - Reduce wafer start cost

- Yield improves
- Functionalities scale at their own pace
- IP sharing includes test, reliability, and yield learning
- Improve performance, power, area, and cost of each functionality



## CONCLUSION



## VERTICAL STACKING (3D)

## **INTERPOSER STACKING (2.5D)**

Die stacking will enable new levels of system integration leading to new form factors and new usage models

Die stacking will also help reduce process node complexity and improve yield and system cost

## **Disclaimer & Attribution**

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.

NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners.

© 2012 Advanced Micro Devices, Inc.

The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and opinions presented in this presentation may not represent AMD's positions, strategies or opinions. Unless explicitly stated, AMD is not responsible for the content herein and no endorsements are implied.

AMD

# **EXILINX**ALL PROGRAMMABLEM

# **Optical Backplanes with 3D Integrated Photonics?**

Ephrem Wu Sr. Director, Xilinx Hot Chips 24, August 2012

# **Overview**

**Electrical vs. Optical Backplanes** 

1

# **Optical vs. Optical Backplanes**

# 3

# Chip Stacking: a Means to an End

© Copyright 2012 Xilinx

# **100Gbps-m Electrical-to-Optical Crossover**



Data points from author. Dotted line from Krishnamoorthy, *et al.*, "Progress in Low-Power Switched Optical Interconnects," *IEEE J. Selected Topics in Quantum Electronics*, vol. 17, no. 2, 2011.
1Gbps over 4 copper wire pairs bidirectionally, effectively 0.5Gbps for each wire pair over a maximum distance of 100m (1000GBASE-T spec limit)
Dong Kam, *et al.*, "Is 25Gb/s On-Board Signaling Viable?", IEEE Trans. Adv. Packaging, vol. 32, no. 2, pp 328-344, May 2009.

© Copyright 2012 Xilinx

# **100Gbps-m Electrical-to-Optical Crossover**



Data points from author. Dotted line from Krishnamoorthy, *et al.*, "Progress in Low-Power Switched Optical Interconnects," *IEEE J. Selected Topics in Quantum Electronics*, vol. 17, no. 2, 2011.
1Gbps over 4 copper wire pairs bidirectionally, effectively 0.5Gbps for each wire pair over a maximum distance of 100m (1000GBASE-T spec limit)
Dong Kam, *et al.*, "Is 25Gb/s On-Board Signaling Viable?", IEEE Trans. Adv. Packaging, vol. 32, no. 2, pp 328-344, May 2009.

© Copyright 2012 Xilinx









## **Electrical Backplane**



**Electrical** 

**Electrical** 

© Copyright 2012 Xilinx

## **Electrical Backplane**



© Copyright 2012 Xilinx

## **Optical Backplane**



## **Optical Backplane**



© Copyright 2012 Xilinx

# Optical vs. Optical Backplanes (MMF vs. SMF) VCSELs: the Incumbent at 10 Gb/s



# **Optical vs. Optical Backplanes** *Challengers at 25 Gb/s*

25Gb/s Si or InP Photonics over SMF for Warehouse Data Centers

10Gb/s VCSELs over OM3 MMF

25Gb/s VCSELs over OM4 MMF



# **Optical vs. Optical Backplanes Options in Stacked Photonics**

VCSELs: Laser is Thermal Bottleneck 70°C 850nm MMF to 90°C 1550nm SMF? **Off-Package Optical Power Supply** Evict thermal bottleneck (laser) from package.





# Which technology will scale better?

© Copyright 2012 Xilinx

# Stacking Photonics Who faces up and who faces down?

Photonics up for fiber attach
Electronics down and with optical vias (holes)

Photonics down for PWB

Si interposer with holes

Electronics down





XII INX > ALL PROGRAMMABLE.

F. Doany, et. al., "Dense 24 TX + 24 RX Fiber-Coupled Optical Module Based on a Holey CMOS Transceiver IC," 60<sup>th</sup> Electronic Components and Technology Conference, pp. 248-255, 2010. F. Doany, et. al., "300-Gb/s 24-Channel Bidirectional Si Carrier Transceiver Optochip for Board-Level Interconnects," 58<sup>th</sup> Electronic Components and Technology Conference, pp. 238-243, 2008. Diagrams redrawn by Xilinx based on papers.

© Copyright 2012 Xilinx

# **Integrated-Photonics Interface Standards**

## Example Today: CEI-28G-VSR



## Common electrical inter-IC signaling standard





# What Happens Outside the Chip Matters



# Link loss at connectors: Distance-Cost Trade-offs

© Copyright 2012 Xilinx

# **Opto-Electronics Supply Chain Needed**



# Summary

# **Electrical vs. Optical Backplanes**

Mainstream backplanes to reach crossover near end of decade

# **Optical vs. Optical Backplanes**

Choice of VCSELs, InP, and Si Photonics impacts costs & scalability

# Chip-stacking is just a means to an end

Electrical-to-optical migration: focus on systems and supply chain

#### AMD – Bryan Black, Senior Fellow

#### Abstract

Since room sized computers, advances in technology have been utilized to increase performance reduce power and shrink form factors. Historically new process and packaging technologies have been at the heart of this evolution of ever increasing performance density. Performance density drives our industry creating new compute form factors and usage models for consumers. Interposer and 3D are emerging as the next generation of technologies that will continue to drive performance density. This talk will discuss how die stacking is required by the industry and outline its primary challenges.

#### Biography

Bryan Black received his Ph.D. from Carnegie Mellon. With over 20 years of experience Black has had the honor of working at Motorola, Intel, and AMD. He has done a little of everything from devices to circuits to microarchitecture to test to packaging. Black is currently a Senior AMD Fellow and runs the AMD die stacking program.

#### Amkor – ChoonHeung Lee, Corporate VP

#### Abstract

"Process Integration and Challenges in 2.5D and 3D TSV Assembly" Small productions in FPGA and Power amplifier applications show that the TSV technology is close to reality even though the supply chain of this technology is still in question. In my context, starting off TSV making in Si at the foundries, MEOL, BEOL and Assembly processes at the OSATs are passed through to be a product. In this talk the processes for MEOL, BEOL and Assembly will be outlined with the process challenges surfaced out. In addition, some issues related to infrastructure building including capacity will be discussed.

#### Biography

- July 2012 present : Head of Product Group
- Jan 2010 present : CTO for RnD and Process/Equipment Engineering
- Feb 1996 Dec 2009 : Team managers and Head of RnD
- Aug 1986 Jan 1993 : MS and PhD at Case Western Reserve University
- Mar 1977 Aug 1985 : BS and MS at Korea University, Marine Corp.

#### Qualcomm – Riko Radojcic, Director

#### Abstract

An overview of the principal challenges in the design of various types of 3D stacks is presented with the focus on Memory-on-Logic and Logic-on-Logic class of solutions. A description of the key components of a design flow - PathFinding, TechTuning and Design Authoring - selected to address these challenges is described, and a snap shot of the current status of these solutions is discussed.

#### Biography

Riko Radojcic is a Director of Engineering at Qualcomm CDMA Technologies, and a leader of various Design-for-Technology initiatives; including Design-for-3D, Design-for-Thermal, Design-for-Manufacturability & Variability, Si-Package CoDesign, etc, and involving methodologies at polygon, circuit, logic, and/or system design levels.

Radojcic has more than thirty year's experience in the semiconductor industry, specializing in the integration of process, design and EDA considerations, and design-for-Si solutions. Before joining Qualcomm, he was a consultant to semiconductor and EDA companies providing engineering and business development services focused on process-design integration. He was a director of business development and marketing for DFM Solutions at PDF Solutions, and a Business Manager and an Architect with Tality and Cadence, specializing in design technology integration and process characterization and modeling.

Radojcic has held a series of managerial and engineering positions with Unisys and Burroughs, in device engineering, failure analyses and reliability engineering areas. He began his career as a process engineer with Ferranti Electronics, UK.

Radojcic received his BSc and PhD degrees from University of Salford, UK.

#### UMC - Remi Yu, Director Marketing

#### Abstract

"Foundry TSV Enablement For 2.5D/3D Chip Stacking"

For 2.5D/3D chip stacking applications, foundry has fully demonstrated fine-pitch high density TSV/RDL, leveraging the single/dual damascene Cu processes of advanced CMOS logic fab. This presentation reviews the critical steps of foundry TSV process, test results, applications and related ecosystem work models.

#### Biography



Remi Yu received the B.S. degree in electrophysics from National Chiao Tung University, Hsinchu, Taiwan in 1989. He is currently deputy director of corporate marketing at United Microelectronics Corporation, with focus on 3D IC and ecosystem marketing. Since joining UMC in 2004, he has worked on the foundry's intellectual property marketing and customer design support. He became a member of the foundry's 3D IC program team starting mid-2011 and has been working on UMC's Open Ecosystem Initiative since. He worked at Macronix International Co., Ltd. prior to joining UMC.

Xilinx - Liam Madden, Corporate VP

Biography



Liam Madden is corporate vice president of FPGA Development and Silicon Technology at Xilinx. He has responsibility for FPGA design, Advanced Packaging (including Stacked Silicon Interconnect) and Foundry Technology. Madden joined Xilinx in 2008, bringing more than 25 years experience in a range of design and technology leadership positions with Digital Equipment Corp., MIPS Technologies, Inc., Microsoft Corp. (XBOX Division), and AMD.

Madden earned a BE from the University College Dublin and a MEng from Cornell University. He holds five patents in the area of technology and circuit design.

#### Xilinx - Shankar Lakka, Director Integration

#### Abstract

Xilinx's Stacked Silicon Interconnect (SSI) technology is used to connect multiple FPGAs or other IP die on a single Silicon Interposer. This presentation gives an overview of 3D IC-SSI Technology and Engineering challenges faced during the development of the 3D-IC products. Areas that will be covered are Timing, Physical and Electrical Verification, as well as Signal Integrity.

#### Biography

Shankar Lakka is the Director of IC Design for full chip FPGA Integration group at Xilinx Inc San Jose. He has been at Xilinx for over 16 years. Shankar has recently led the design and full chip Integration of Xilinx SSI devices. He holds 14 US Patents.

#### Xilinx - Ephrem Wu, Sr. Director

#### Abstract

This talk presents a view of when optical interconnects will be prevalent in communications backplanes. It outlines design and supply-chain considerations for 3D-integrated photonics to play a key role in optical backplanes.

#### Biography

Ephrem Wu is Senior Director of Advanced Communications at Xilinx. He is responsible for advanced FPGA solutions for wireline and wireless communications infrastructure. He joined Xilinx in 2010 and led the design of the industry's first heterogeneous FPGA. From 2000-2010, Ephrem was with Velio Communications and LSI (which acquired Velio in 2004), leading the definition and development of packet switches and network processors. Prior to Velio, he held various ASIC, circuit, and software design positions at SGI, Hewlett-Packard, Panasonic, and AT&T.

Ephrem earned a BSE degree from Princeton University and an MS degree at the University of California, Berkeley. Both degrees are in EE with an emphasis on computer-aided design. He holds nine patents in switch architecture and circuit design.

# HotChips 24 Thanks this year's Sponsors:



Copyright © 2012 HOTCHIPS. All rights reserved. All trademarks property of their respective owners.



