WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 626
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 626
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)
            
J50745 800X100
WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 626
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 626
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)

A Big Step Forward to Limit AI Power Demand

A Big Step Forward to Limit AI Power Demand
by Bernard Murphy on 08-25-2025 at 6:00 am

By now everyone knows that AI has become the all-consuming driver in tech and that NVIDIA GPU-based platforms are the dominant enabler of this revolution. Datacenters worldwide are stuffed with such GPUs, serving AI workloads from automatically drafting emails and summarizing meetings to auto-creating software and controlling factories. A true revolution in automation is underway, yet we already see that the power required to meet this new demand will quickly exceed utility generation plans. Bending that power growth curve is imperative, demanding further AI power reduction while the hardware is still in design. Understanding how to make such improvements starts with improved pre-silicon power estimation; Cadence and NVIDIA have been working together for many years on this objective and have announced a major step forward according to a recent press release.

A Big Step Forward to Limit AI Power Demand

The challenges in pre-silicon power estimation for AI

Pre-silicon dynamic power estimation has been available for some time, but there are three key challenges in applying these methods to AI systems: the size of the designs, the size and complexity of representative test cases based on AI models and benchmarks, and added burdens of dynamic power analysis under these constraints, which for high accuracy in estimation demand modeling at the gate-level, further straining modeling capacity limits.

NVIDIA has been using Cadence emulation platforms (Palladium) for 20 years or more to verify the functionality of their chip designs before committing to manufacturing, as have many other semiconductor (and more recently system) design companies. As designs sizes have grown exponentially, Palladium capacity has kept pace to the point that these emulators can now accommodate designs running to tens of billions of gates, in step with the very largest designs being built today.

The second challenge is the size of test cases. In non-AI applications, real use-case tests have been viewed as impractically large for detailed pre-silicon testing. Engineers have resorted to synthetic tests to validate essential characteristics while postponing real use-case validation to post-silicon, where problems found may require a costly re-spin of the design. This limitation can become even more acute for dynamic power analysis, which builds on top of functional verification. DPA overheads have commonly limited analysis to sampling only in expert-determined time windows. From these samples engineers construct an extrapolated sense of dynamic power averages and peaks through the synthetic test cycle yet risk the possibility that they will miss critical power anomalies outside those windows.

Unfortunately, this sampling approach is ineffective for estimating dynamic power in large AI applications. The only use-cases worth evaluating power against are complete AI tests with models and benchmark testcases, since it is very unclear how synthetic or sampling methods could confidently cover a representative subset of corner cases. We already know how big AI models can be and how involved the processing pipeline is for such models, whether for say 4/8K image CNNs or for transformer-based LLMs. Given typical frames/second rates in image processing or prompt response times in LLMs, it is obvious that billions of cycles must be emulated to span a realistic use case.

Add to this the overhead for dynamic power analysis (DPA) on top of that functional emulation and you can understand why realistic power profiling for big AI in pre-silicon testing seemed out of reach. Until now.

Cadence Redefines What is Possible with their DPA App

Cadence recently released their new DPA App leveraging the capabilities of their Palladium® Z3 Enterprise Emulation Platform. Here I must jump straight to the punchline because it is amazing. Cadence reported, with NVIDIA approval (NVIDIA ran their own benchmarks), that they ran DPA on this platform on billion-gate designs across billions of cycles within a few hours, with up to 97% power accuracy as determined against post-silicon power measurements. Everyone likes to claim that whatever they are selling is a game-changer, but results like this truly merit that description.

It’s worth peeling the accuracy point further since I have some background in this area from a previous life. I talked with Michael Young (Director of Product Marketing at Cadence and one of the Quickturn guys from way back) to get a better understanding.

First, I should acknowledge that I have been as dismissive as others of claims for accuracy in pre-silicon power estimation. These are usually based on RTL simulations, already suspect because they don’t accurately reflect synthesis optimizations or power dissipated in interconnect unless they support parasitics back-annotation from implementation trials. Accuracies under these constraints typically run from 15-20% of signoff estimates, not good enough to drive careful design optimization for power.

Michael makes two counter arguments for this DPA flow. First the analysis must be run at gate level on the design, with directly backannotated parasitics. Second, he was very careful to stress that comparisons with post silicon power should use exactly the same conditions (same model, same benchmarks, same software) as used in emulation DPA testing. He told me that he sometimes hears from other companies in non-AI applications that DPA doesn’t correlate very accurately with their post silicon measurements. When they dig deeper, it becomes obvious that they are not comparing identical pre- and post-silicon conditions. It’s tempting to believe we can be approximate on test similarity for power estimation pre-silicon and measurement in the lab and still get claimed accuracy. But of course we can’t – different conditions are where the discrepancies arise.

As AI usage grows worldwide, we need every possible tool we can find to bend that power curve. Cadence DPA running on Palladium Z3 provides a big step forward to help companies like NVIDIA further tune the power their chips consume, under real workloads. You can learn more at the Cadence Palladium web page.

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.