WP_Term Object
(
    [term_id] => 80
    [name] => CLK Design Automation
    [slug] => clk-design-automation
    [term_group] => 0
    [term_taxonomy_id] => 80
    [taxonomy] => category
    [description] => 
    [parent] => 14433
    [count] => 8
    [filter] => raw
    [cat_ID] => 80
    [category_count] => 8
    [category_description] => 
    [cat_name] => CLK Design Automation
    [category_nicename] => clk-design-automation
    [category_parent] => 14433
    [is_post] => 1
)

In Low Voltage Timing, the Center Cannot Hold

In Low Voltage Timing, the Center Cannot Hold
by Bernard Murphy on 01-25-2016 at 7:00 am

When I started discussing this topic with Isadore Katz, I was struggling to find a simple way to  explain what he was telling me – that delay and variance calculations in STA tools are wrong at low voltage because the average (the center) of a timing distribution shifts from where you think it is going to be. He told me that I’m not alone in my struggle – he’s never found an easy way to boil it down either. You just have to go through all the steps then the conclusion at the end makes sense. Therefore, with apologies to timing experts, here is my explanation. Throughout, I’m going to use “typical” for most common / mode / nominal value and “average” for mean.

A Static Timing Analysis (STA) tool is really nothing more than an adding machine with a simple less-than/greater-than check when it hits a timing end-point, say a flip-flop. At the simplest level, it traverses paths starting from source flops, adding delays (from gates and interconnect) along those paths, until it hits destination flops. Where paths converge in-between those points, it keeps worst- and best-case delays (path-based analysis is more refined, but I think those details are not essential for this argument). Then it’s all about when the data can potentially get to a flop relative to when the clock can get to the flop. Too early and you have a hold time violation, too late and you have a setup violation.

The timing values (typical values) come from library lookup tables indexed by gate-type, input slew and output load, and for models for the interconnect between gates. Back in the day, you would have tables for different process corners – slow/slow (SS) for slow NFET/slow PFET, fast/fast (FF), typical/typical (TT) and permutations thereof. You analyze in each of the corners, tweak the design to fix timing violations and all was good. But then it got complicated.

At 40nm, margins represented purely as corners became too pessimistic to get reasonable yield at reasonable power, because statistical sampling across many lots from many designs buries different variances between different designs in the final variance, which is too pessimistic per design. Statistical timing analysis should have been the ideal solution but performance and other issues eventually killed that approach. So the foundries aimed for something that could support conventional STA methods with adjustments. They split measured variances into design-dependent variance (on chip variation, or OCV) and a design-independent part (the die-to-die variance) and called the latter “global”. That gives you corners called SSG, TTG and FFG. A design team must then add back in OCV variance based on the structure of their design to get the true variances they need to model. But they can’t just add/subtract the old-style 3σ to these these corners; that would be even more pessimistic than the traditional corners and the whole point is to minimize pessimism.

So how do you calculate OCV? You still want to stick to single-pass analysis, but enhanced by different methods to approximate measured variances within those constraints. You can pick from AOCV, based on pre-characterized chains of gates to get variances at the end of the chain, or POCV or SOCV which in different ways compute variances at each stage in a path. (LVF is a recently introduced format which aims to combine representations for all these methods in one standard but does not prescribe how the calculation should actually be done.)

What is important in all these methods is that you are propagating typical values as delays, but delay and variance calculated through these methods only serves as an accurate representation of the underlying distribution if that distribution is normal (Gaussian). If this assumption is reasonable (and it is at normal voltages), then as an input distribution passes through stages in a path, the average input delay to the next stage is the sum of the average delays up to that stage, because that’s how Gaussians sum.

But when distributions are skewed, as they are at low voltage, something different happens. The sum of skewed distributions tends to a normal distribution as you pass through stages (thanks to the Central Limit Theorem) but at each stage the average of the distribution shifts away from the sum of the typical values up to that stage. This undermines the calculation based on typical values in two ways. First, the true average progressively moves to a value greater than the sum of typical values up to that stage. And second, the output slew lookup, which is now based on an incorrect delay value, is therefore also incorrect and this error also compounds. When you get to the end of the path to check setup and hold, the computed typical can differ from the true average by as much as 3σ for the distribution, as large as the amount you are trying to correct for with your OCV calculation. And that means on a path like this, the typical value adjusted by 3σ on one side could be extremely pessimistic and on the other side extremely optimistic.

Some people argue this is a non-problem; that in fact these differences actually average out. That doesn’t seem very likely to me. The math of combining skewed distributions leading to a shift in the average is indisputable. Also gate timing distributions should always skew to longer delays since non-linearity near the switching threshold should favor longer delays rather than shorter delays at low voltage. There’s really no way these shifts can cancel out in a stage-based calculation. The AOCV approach could in principle get this right since it pre-characterizes chains of gates which should incorporate all effects, though apparently it doesn’t take account of slews so it’s still wrong. Not to mention that lookup tables for this approach could get rather large.

Maybe you could fix the stage-based approach by using left and right variances at each stage to compute a shift at that stage, which you would then use to get the delay and slew lookup right. There have been attempts along these lines, though it’s not clear they have been very successful. Or more generally, you could model a skewed distribution using 3 points and evolve that along the path. This might be mathematically feasible, but I imagine there would be problems in performance. At minimum you’d have to do 3 divisions to scale this model curve to a (reasonably sized) lookup table so you could figure out the shift, then 3 multiplications to scale back, none of which is going to help run-time. And I don’t see any way you could emulate the correct behavior using only addition.

The only way to do this correctly, at least along a set of paths of concern, is to do variance-aware transistor based modeling, either using MCSpice (which would be very slow) or CLKDA FX analysis (which is much faster). To get a more knowledgeable analysis of the whole problem and the FX approach, click HERE.

More articles by Bernard…