# HOT Chips 2005

# **Circuit Design for Low Power**

Kevin Nowka, IBM Austin Research Laboratory

HOT Chips 2005 – Power Tutorial —

#### Agenda

**Designing with power and energy limits** 

**Overview of VLSI power** 

**Technology, Scaling, and Power** 

**Review of scaling** 

A look at the real trends and projections for the future

Active power – components, trends, managing active power

Static power – components, trends, managing static power

Summary

## **Designing within limits: power & energy**

- Thermal limits (for most parts self-heating is a substantial thermal issue)
  - package cost (4-5W limit for cheap plastic package, 100W/sq-cm air cooled limit, 7.5kW 19" rack)
  - Device reliability (junction temp > 125C substantial reduction in reliability)
  - Performance (25C -> 105C loss of 30% of performance)
- Distribution limits
  - Substantial portion of wiring resource, area for power dist.
  - Higher current => lower R, greater dl/dt => more wire, decap
  - Package capable of low impedance distribution
- Energy capacity limits
  - AA battery ~1000mA.hr => limits power, function, or lifetime
- Energy cost
  - Energy for IT equipment large fraction of total cost of ownership

#### Agenda

**Designing with power and energy limits** 

#### **Overview of VLSI power**

**Technology, Scaling, and Power** 

**Review of scaling** 

A look at the real trends and projections for the future

Active power – components, trends, managing active power

Static power – components, trends, managing active power

Summary

### **CMOS circuit power consumption components**

- Dynamic power consumption (  $\frac{1}{2} C_{sw} V_{dd} \Delta V f + I_{st} V_{dd}$ )
  - Load switching (including parasitic & interconnect)
  - Glitching
  - Shoot through power (I<sub>st</sub>V<sub>dd</sub>)
- Static power consumption (I<sub>static</sub>V<sub>dd</sub>)
  - Current sources bias currents
  - Current dependent logic -- NMOS, pseudo-NMOS, CML
  - Junction currents
  - Subthreshold MOS currents
  - Gate tunneling

#### Agenda

**Designing with power and energy limits** 

**Overview of VLSI power** 

Technology, Scaling, and Power

**Review of scaling** 

A look at the real trends and projections for the future

Active power – components, trends, managing active power

Static power – components, trends, managing active power

Summary



#### Agenda

**Designing with power and energy limits** 

**Overview of VLSI power** 

**Technology, Scaling, and Power** 

**Review of scaling** 

A look at the real trends and projections for the future

Active power – components, trends, managing active power

Static power – components, trends, managing active power

Summary

## **CMOS Circuit Delay and Frequency**

### **VLSI system frequency determined by:**

Sum of propagation delays across gates in "critical path" --Each gate delay, includes time to charge/discharge load thru a FET and interconnect delay to distribute to next gate input.

$$T_{d} = kCV/I$$
$$= kCV/(V_{dd}-V_{t})^{\alpha}$$

Sakuri  $\alpha\text{-power}$  law model of delay



#### **Microprocessor Frequency**

In practice the trend is:

Frequency increasing by 2X (delay decreasing by 50%), not the 1.4X (30%) for constant field scaling (src: ITRS '01).

Why? decreasing logic/stage and increased pipeline depth.





Energy dissipated for either output transition consumes:  $\frac{V_2 C_L V_{dd}^2}{V_2 C_L V_{dd}^2}$ 

Gate level energy consumption should improve as  $\alpha^3$  under constant field scaling, but....



With each generation, voltage has decreased 0.85x, not 0.7x for constant field.

Thus, energy/device is decreasing by 50% rather than 65%



- a net increase in energy consumption,

- with freq 2x, active power is increasing by 50% (src: ITRS '01)

\* HP MP = High Performance Micro Processor

### **Active-Power Reduction Techniques**

 $\mathbf{P} = \frac{1}{2} \mathbf{C}_{sw} \mathbf{V}_{dd} \Delta \mathbf{V} \mathbf{f} + \mathbf{I}_{st} \mathbf{V}_{dd} + \mathbf{I}_{static} \mathbf{V}_{dd}$ 

Active power can be reduced through:

- Capacitance minimization
  - Power/Performance in sizing
  - Clock-gating
  - Glitch suppression
  - Hardware-accelerators
  - System-on-a-chip integration
- Voltage minimization
  - (Dynamic) voltage-scaling
  - Low swing signaling
  - SOC/Accelerators
- Frequency minimization
  - (Dynamic) frequency-scaling
  - SOC/Accelerators

### **Capacitance minimization**

#### $\mathbf{P} = \frac{1}{2} \mathbf{C}_{sw} \mathbf{V}_{dd} \Delta \mathbf{V} \mathbf{f} + \mathbf{I}_{st} \mathbf{V}_{dd} + \mathbf{I}_{static} \mathbf{V}_{dd}$

Only the devices (device width) used in the design consume active power!

- Runs counter to the complexity-for-IPC trend
- Runs counter to the SOC trend

### **Capacitance minimization**

**Example of managing design capacitance:** 

Device sizing for power efficiency is significantly different than sizing for performance – sizing of the gate size multiplier in an exponential-horn of inverters.



## **Functional Clock Gating**

- 25-50% of power consumption due to driving latches.
- Utilization of most latches is low (~10-35%)
- Gate off unused latches and associated logic:
  - Unit level clock gating turn off clocks to FPU, MMX, Shifter, L/S unit, …
  - Functional clock gating turn off clocks to individual latch banks – forwarding latch, shift-amount register, overflow logic & latches, …
- Asynch is the most aggressive gating

### **Glitch suppression**

- Glitches can represent a sizeable portion of active power, (up to 30% for some circuits in some studies)
- Three basic mechanisms for avoidance:
  - Use non-glitching logic, e.g. domino
  - Add redundant logic to avoid glitching hazards
    - Increases cap, testability problems
  - Adjust delays in the design to avoid
    - Shouldn't timing tools do this already if it is possible?

### **Voltage minimization**

- Lowering voltage swing,  $\Delta V$ , lowers power
  - Low swing logic efforts have not been very successful (unless you consider array voltage sensing)
  - Low swing busses have been quite successful
- Lowering supply, Vdd and  $\Delta V$ , (voltage scaling) is most promising:
  - Frequency ~V, Power ~ $V^3$

#### **Voltage Scaling Reduces Active Power**

#### Voltage Scaling Challenges

•

Custom CPUs, Analog, PLLs, and Avg Relative Ring Osc Delay/Power I/O drivers don't voltage scale easily 1.2 Sensitivity to supply voltage 4.5 varies circuit to circuit – esp SRAM, buffers, NAND4 3.5 0.8 Thresholds tend to be too high at 3 a-pwr delav low supply meas delav 2.5 0.6 model pwr **Voltage Scaling Benefits** 2 meas pwr 0.4 1.5 Can be used widely over entire 1 chip 0.2 0.5 **Complementary CMOS scales well** Ω over a wide voltage range 0.95 0.7 1.2 1.45 1.7 Can optimize power/performance Supply Voltage (MIPS/mW) over a 4X range

After Carpenter, Microprocessor forum, '01

### **Dynamic Voltage-Scaling (e.g. XScale, PPC405LP)**

PowerPC 405LP measurements: 18:1 power range over 4:1 frequency range



### **Frequency minimization**

- Lowering frequency lowers power linearly
  - DOES NOT improve energy efficiency, just slows down energy consumption
  - Important for avoiding thermal problems

**IBM Austin Research Laboratory Voltage-Frequency-Scaling Measurements PowerPC 405LP** 12:20 PM File Control Setup Utilities Help Measure Analyze лè 50.0 kSa/s ~~~~~~ 3 🗂  $\frac{2}{2}$ 🌒 📴 1.00 V/div 2) 🔽 5.00 V/div Freq Scaling ſſ Plus DVS Ĵ٦ ٦te Ĵ Ĵ <u>∱</u>\_;∱-Ŀ More (1of 2) Clear All. 5.04000000 ms 📢 0 🕨 H 2.00 ms/div 🕠 🔨 T 1.520 V Src: After Nowka, Freq scale 1/4 freq, 1/4 pwr; DVS 1/4 freq, 1/10 pwr et.al. JSSC, Nov '02 HOT Chips 2005 – Power Tutorial

### **Shoot-through minimization**

- For most designs, shoot-thru represents 8-15% of active power.
- Avoidance and minimization:
  - Lower supply voltage
  - Domino?
  - Avoid slow input slews
  - Careful of level-shifters in multiple voltage domain designs

#### Agenda

**Designing with power and energy limits** 

**Overview of VLSI power** 

**Technology, Scaling, and Power** 

**Review of scaling** 

A look at the real trends and projections for the future

Active power – components, trends, managing active power

Static power – components, trends, managing active power

Summary

#### **Static Power**

- Static energy consumption (I<sub>static</sub>V<sub>dd</sub>)
  - Current sources even uA bias currents can add up.
  - NMOS, pseudo-NMOS not commonly used
  - CMOS CML logic significant power for specialized use.
  - Junction currents
  - Subthreshold MOS currents
  - Gate tunneling

### **Subthreshold Leakage**

 $\mathbf{P} = \mathbf{KV} \mathbf{e}^{(\text{Vgs-Vt})q/nkT} \left(\mathbf{1} - \mathbf{e}^{\text{Vds}q/kT}\right)$ 

- Supplies have been held artificially high (for freq)
  - Threshold has not dropped as fast as it should
  - Want to maintain lon:loff = ~1000uA/u : 10nA/u
  - Relatively poor performance => Low Vt options
    - 70-180mV lower Vt, 10-100x higher leakage, 5-15% faster
- Subthreshold lkg especially increasing in short channel devices (DIBL) & at high T – 100-1000nA/u
- Subthreshold slope 70-80mV/decade
- Cooling changes the slope....but can it be energy efficient?

### **Projected Subthreshold Leakage Trends**



Src: ITRS '01, '03 Note: Hatched bars are interpolated

### **Trends in Leakage Contribution to Power**

Fit of published active and subthreshold leakage densities



Src: Nowak, et al.

### **Gate Leakage**

- Gate tunneling becoming dominant leakage mechanism
  in very thin gate oxides
- Current exponential in oxide thickness
- Current exponential in voltage across oxide
- Reduction techniques:
  - Lower the field (voltage or oxide thickness)
  - New gate ox material

IBM Austin Research Laboratory **Gate Leakage Trends** Fit of published active, subthreshold, and gate leakage densities 1000e Power Density 100 Power (W/cm<sup>2</sup>) 10 ۶l 0.1 **Gate-Leakage** 0.010.0010.000111日 (5) (). I 0.01٩l Lpoly (µm) After Nowak, et al. HOT Chips 2005 – Power Tutorial Foil # 32

### **Future Leakage, Standby Power Trends**



Src: ITRS '01

And, recall number of transistors/die has been increasing 2X/2yrs (Active power/gate should be 0.5x/gen, has been 1X/gen)

For the foreseeable future, leakage is a major power issue

## **Standby-Power Reduction Techniques**

Standby power can be reduced through:

- Capacitance minimization
- Voltage-scaling
- Power gating
- Vdd/Vt selection

### **Capacitance minimization**

Only the devices (device width) used in the design leak!

- Runs counter to the complexity-for-IPC trend
- Runs counter to the SOC trend
- Transistors are not free -- Even though they are not switched they still leak



# **Supply/Power Gating**

- Especially for energy constrained (e.g. battery powered systems). Two levels of gating:
  - "Standby, freeze, sleep, deep-sleep, doze, nap, hibernate": lower or turn off power supply to system to avoid power consumption when inactive
    - Control difficulties, hidden-state, entry/exit, "instanton" or user-visible.
  - Unit level power gating turn off inactive units while system is active
    - Eg. MTCMOS
    - Distribution, entry/exit control & glitching, state-loss...

# **MTCMOS**

- Use header and/or footer switches to disconnect supplies when inactive.
- For performance, low-Vt for logic devices.
- 10-100x leakage improvement, ~5% perf overhead
- Loss of state when disconnected from supplies
- Large number of variants in the literature





Thick oxide (Tox) reduces gate leakage by orders of magnitude

- Decreases subthreshold leakage
- Improvement beyond use of long channel device
- 2-5x improvement in subthreshold leakage
- 15-35% performance penalty

# **Vt or/and Vdd selection**

- Design tradeoff:
  - Performance => High supply, low threshold
  - Active Power => Low supply, low threshold
  - Standby => Low supply, high threshold
- Static
  - Stack effect minimizing subthreshold thru single fet paths
  - Multiple thresholds: High Vt and Low Vt transistors
  - Multiple supplies: high and low Vdd

# Vt or/and Vdd selection (cont'd)

- Design tradeoff:
  - Performance => High supply, low threshold
  - Active Power => Low supply, low threshold
  - Standby => Low supply, high threshold
- Static
  - Stack effect minimizing subthreshold thru single fet paths
  - Multiple thresholds: High Vt and Low Vt Transistors
  - Multiple supplies: high and low Vdd
  - Problem: optimum (Vdd,Vt) changes over time, across dice
- Dynamic (Vdd,Vt) selection
  - DVS for supply voltage
  - Dynamic threshold control thru:
    - Active well
    - Substrate biasing
    - SOI back gate, DTMOS, dual-gate technologies





IBM Austin Research Laboratory =

## Agenda

**Designing with power and energy limits** 

**Overview of VLSI power** 

**Technology, Scaling, and Power** 

**Review of scaling** 

A look at the real trends and projections for the future

Active power – components, trends, managing active power

Static power – components, trends, managing active power

#### Summary

# **Low Power Circuits Summary**

**Technology, Scaling, and Power** 

Technology scaling hasn't solved the power/energy problems.

So what to do? We've shown that,

Do less and/or do in parallel at low V. For the circuit designer this implies:

- supporting low V,
- supporting power-down modes,
- choosing the right mix of Vt,
- sizing devices appropriately
- choosing right Vdd, (adaptation!)

## References

#### **Metrics** •

- T. Sakurai and A. Newton, "Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas", IEEE Journal of Solid State Circuits, v. 25.2, pp. 584-594, Apr. 1990. R. Gonzalez, B. Gordon, M. Horowitz, "Supply and threshold voltage scaling for low power CMOS" *IEEE*
- Journal of Solid State Circuits, v. 32, no. 8, pp. 1210-1216, August 2000. Zyuban and Strenski, "Unified Methodology for Resolving Power-Performance Tradeoffs at the Microarchitectural and Circuit Levels", ISPLED Aug.2002
- Brodersen, Horowitz, Markovic, Nikolic, Stojanović "Methods for True Power Minimization", ICCAD Nov. 2002
- Stojanovic, Markovic, Nikolic, Horowitz, Brodersen, "Energy-Delay Tradoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization", ESSCIRC, Sep. 2002

#### **Power/Low Power** •

- SIA, International Technology Roadmap for Semiconductors, 2001, 2003 available online.
- V. Agarwal, M.S. Hrishikesh, S.W. Keckler, and D. Burger. "Clock Rate Versus IPC: The End of the Road for Conventional Microarchitectures," 27th International Symposium on Computer Architecture (ISCA), June, 2000.
- Allan, et. al., "2001 Tech. Roadmap for Semiconductors", IEEE Computer Jan. 2002

- Chandrakasan, Broderson, (ed) <u>Low Power CMOS Design</u> IEEE Press, 1998. Oklobdzija (ed) <u>The Computer Engineering Handbook</u> CRC Press, 2002 Kuo, Lou <u>Low voltage CMOS VLSI Circuits</u>, Wiley, 1999. Bellaouar, Elmasry, <u>Low Power Digital VLSI Design</u>, <u>Circuits and Systems</u>, Kluwer, 1995.
- Chandrakasan, Broderson, Low Power Digital CMOS Design Kluwer, 1995.
- A. Correale, "Overview of the power minimization techniques employed in the IBM PowerPC 4xx embedded controllers" IEEE Symposium on Low Power Electronics Digest of Technical Papers, pp. 75-80.1995.
- K. Nowka, G. Carpenter, E. MacDonald, H. Ngo, B. Brock, K. Ishii, T. Nguyen, J. Burns, "A 0.9V to 1.95V dynamic voltage scalable and frequency scalable 32-bit PowerPC processor", Proceedings of the IEEE International Solid State Circuits Conference, Feb. 2002.
- K. Nowka, G. Carpenter, E. MacDonald, H. Ngo, B. Brock, K. Ishii, T. Nguyen, J. Burns, "A 32-bit PowerPC System-on-a-Chip with support for dynamic voltage scaling and dynamic frequency scaling", IEEE Journal of Solid State Circuits, November, 2002.

### References

#### Low Voltage / Voltage Scaling

- E. Vittoz, "Low-power design: ways to approach the limits" *IEEE International Solid State Circuits Conference Digest of Technical Papers*, pp. 14-18, 1994.
- M. Horowitz, T. Indermaur, R. Gonzalez, "Low-power digital design" *IEEE Symposium* on Low Power Electronics Digest of Technical Papers, pp. 8-11, 1994.
- R. Gonzalez, B. Gordon, M. Horowitz, "Supply and threshold voltage scaling for low power CMOS" *IEEE Journal of Solid State Circuits*, v. 32, no. 8, pp. 1210-1216, August 2000.
- T. Burd and R. Brodersen, "Energy efficient CMOS microprocessor design" *Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences*, v. 1, pp. 288-297, 466, 1995.
- K. Suzuki, S. Mita, T. Fujita, F. Yamane, F. Sano, A. Chiba, Y. Watanabe, K. Matsuda, T. Maeda, T. Kuroda, "A 300 MIPS/W RISC core processor with variable supply-voltage scheme in variable threshold-voltage CMOS" *Proceedings of the IEEE Conference on Custom Integrated Circuits Conference*, pp. 587–590, 1997
- T. Kuroda, K. Suzuki, S. Mita, T. Fujita, F. Yamane, F. Sano, A. Chiba, Y. Watanabe, K. Matsuda, T. Maeda, T. Sakurai, T. Furuyama, "Variable supply-voltage scheme for low-power high-speed CMOS digital design" *IEEE Journal of Solid State Circuits*, v. 33, no. 3, pp. 454-462, March 1998.
- T. Burd, T. Pering, A. Stratakos, R. Brodersen, "A dynamic voltage scaled microprocessor system" *IEEE International Solid State Circuits Conference Digest of Technical Papers*, pp. 294-295, 466, 2000.

### References

#### • Technology and Circuit Techniques

- E. Nowak, et al., "Scaling beyond the 65 nm node with FinFET-DGCMOS" Proceedings of the IEEE Custom Integrated Circuits Conference, Sept. 21-24, 2003, pp.339 – 342
- L. Clark, et al. "An embedded 32b microprocessor core for low-power and highperformnace applications", IEEE Journal of Solid State Circuits, V. 36, No. 11, Nov. 2001, pp. 1599-1608
- S. Mukhopadhyay, C. Neau, R. Cakici, A. Agarwal, C. Kim, and K. Roy, "Gate leakage reduction for scaled devices using transistor stacking" IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Aug. 2003, pp. 716 – 730
- Scale Integration (VLSI) Systems, Aug. 2003, pp. 716 730
  A. Bhavnagarwala, et al., "A pico-joule class, 1GHz, 32 Kbyte x 64b DSP SRAM with Self Reverse Bias" 2003 Symposium on VLSI Circuits, June 2003, pp. 251-251.
- S. Mutoh, et al., "1-V Power Supply High-Speed Digital Circuit Technology with Multi-Threshold Voltage CMOS," IEEE Journal of Solid State Circuits, vol. 30, no. 8, pp. 847-854, 1995.
- K. Das, et al., "New Optimal Design Strategies and Analysis of Ultra-Low Leakage Circuits for Nano- Scale SOI Technology," Proc. ISLPED, pp. 168-171, 2003.
- R. Rao, J. Burns and R. Brown, "Circuit Techniques for Gate and Sub-Threshold Leakage Minimization in Future CMOS Technologies" Proc. ESSCIRC, pp. 2790-2795, 2003.
- R. Rao, J. Burns and R. Brown, "Analysis and optimization of enhanced MTCMOS scheme" Proc. 17th International Conference on VLSI Design, 2004, pp. 234-239.