WP_Term Object
    [term_id] => 80
    [name] => CLK Design Automation
    [slug] => clk-design-automation
    [term_group] => 0
    [term_taxonomy_id] => 80
    [taxonomy] => category
    [description] => 
    [parent] => 14433
    [count] => 8
    [filter] => raw
    [cat_ID] => 80
    [category_count] => 8
    [category_description] => 
    [cat_name] => CLK Design Automation
    [category_nicename] => clk-design-automation
    [category_parent] => 14433

Voltage Limbo Dancing: How Low Can You Go?

Voltage Limbo Dancing: How Low Can You Go?
by Paul McLellan on 03-09-2015 at 7:00 am

 All chips these days have to worry about power. Indeed it is typically the top of the priority list of concerns, above performance and even area. Transistors are effectively fast and free, but you can’t have too many of them (at least turned on at once). The most obvious way to reduce power is to lower the supply voltage. This occurs squared in the dynamic power equation and is non-linear in static (leakage) power. This is not just a problem for the most leading edge processes, 14/16/10nm. A lot of designs, especially for IoT, are done in non-leading-edge processes. Indeed, TSMC has recently gone back and produced even lower power versions (ULP) of several of their mature processes that can run at lower voltages than the original processes when they were introduced. The way to get the power to an absolute minimum is to run with the supply voltage as low as possible, but this means that the margins for timing are critical. Being optimistic will lead to outright failure or low yield; being pessimistic leaves a lot of performance and power-reduction on the table.

As is usually the case in EDA, the substitute for pessimism is accuracy. But even using lots of process corners isn’t enough since there is too much variation and it is often impossible to close timing with this approach: fixing the FF corner causes violations at the SS corner and vice versa. Systematic margining will not get you there. It is necessary to explicitly analyze variance since putting the voltage up is not a viable option.

So here are four pictures to scare you!

At very low voltage, process variance can be as much as half the delay, much more than at higher voltages. Inaccuracy in calculating variance is not a second-order effect that can be ignored:

At low voltage, the variation is non-Gaussian (not a normal distribution) and in particular the tails are longer and just guard-banding with a certain number of standard deviations will miss those tails:

Constraints are also affected by process variance and voltage too, and can also be non-Gaussian at very low voltages:

Static timing analysis misses some effects, in particular Miller capacitance, which is a dominant effect below 20nm and especially at low voltage. On high fanout nets such as clocks, STA will miss Miller Capacitance and so miss violations:
 So that is scary. What can you do about it?

CLKDA’s portfolio of FX tools is designed to address these issues and let you get the voltage as low as possible by giving you the accuracy you need to be confident of working/yielding silicon.

FX is within 2% of MC SPICE for delay but is 400,000X faster, so can analyze thousands of paths in minutes and full clock-trees in hours. It supports all major foundries and libraries. It is in production down to 14/16nm and is being used in leading SoC designs today.

There are multiple components:

  • Variance FX: the industry standard for timing derates (supports AOCV, POCV and LVF)

    • derates drive process yield into physical flow (STA, P&R, optimization)
    • all cells, arcs, loads and skews
    • 1000 cells per hour using 100 cores
  • Macro FX: extends variance FX to complex logic cells

    • big delay buffers, retention flops, very large flop trays, memories
  • Voltage FX: voltage and variance sensitivity for delay and constraints for cell libraries

    • analyze library across voltage operating points
    • identify at-risk cells in library
    • extend to process and temperature
  • Clock FX: full-chip clock tree analysis, SPICE accurate insertion delay and skew

    • automatically finds clock trees
    • corner, global corner/statistical or full statistical
    • measure delay, crosstalk, voltage effects
    • fast: 100M instance 20nm design in under 2 hours
  • Path FX: run tens of thousands of paths in just minutes, identifies timing surprises no other STA tool can find

    • SPICE-accurate path analysis
    • timing waivers, multi-voltage paths, PVT path sweeps, black box timing models,

 The FX platform can be used across the entire design process from library design, through floorplanning, physical design, clock tree generation, optimization and signoff.

There are lots more details about the FX platform on the CLKDA website here.

Share this post via:


There are no comments yet.

You must register or log in to view/post comments.