WP_Term Object
(
    [term_id] => 14
    [name] => Synopsys
    [slug] => synopsys
    [term_group] => 0
    [term_taxonomy_id] => 14
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 698
    [filter] => raw
    [cat_ID] => 14
    [category_count] => 698
    [category_description] => 
    [cat_name] => Synopsys
    [category_nicename] => synopsys
    [category_parent] => 157
)
            
800x100 Efficient and Robust Memory Verification
WP_Term Object
(
    [term_id] => 14
    [name] => Synopsys
    [slug] => synopsys
    [term_group] => 0
    [term_taxonomy_id] => 14
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 698
    [filter] => raw
    [cat_ID] => 14
    [category_count] => 698
    [category_description] => 
    [cat_name] => Synopsys
    [category_nicename] => synopsys
    [category_parent] => 157
)

AI and ML for Sanity Regressions

AI and ML for Sanity Regressions
by Bernard Murphy on 10-13-2021 at 6:00 am

You probably know the value proposition for using AI and ML (machine learning) in simulation regressions. There are lots of knobs you can tweak on a simulator, all there to help you squeeze more seconds, or minutes out of a run. If you know how to use those options. But often it’s easier to talk to your friendly AE, get a reasonable default setup and stick with that. Consider that a sort of one-step learning.

However, what works well in one case may not be optimal in others. Learning must evolve as designs and test cases change. You can’t reasonably call the AE in for every run, and you shouldn’t have to. ML can automate the learning. Which makes sense, but what I had not realized is that one of the big impact areas for this technology is for sanity regressions. Vishwanath (Vish) Gunge of Microsoft elaborated at Synopsys Verification Day 2021.

AI and ML for Sanity Regressions

Why short regressions are such a good fit

Sanity tests are those tests you run to make sure you (or someone else) didn’t do something stupid. Like accidentally checking in code that you hadn’t finished fixing. Or leaving a high verbosity debug switched turned on. When you want to integrate all the code in a big subsystem of the whole SoC, probabilities of a basic mistake add up quickly. We design sanity tests to smoke these problems out quickly. Because the last thing you want is to launch overnight regressions, then come back in the morning to garbage results. Sanity tests are designed to run quickly, maybe a few minutes, at most say 30 minutes, in parallel across many machines.

Seems like that wouldn’t be where you would find a big win in ML optimization. But you’d be wrong. It’s not the test run-time that matters, it’s the frequency of those tests. Vish said that in their environment, sanity regressions consume huge compute resources, running many times per day. Which I read as them using those regressions in the best possible way – flushing out basic mistakes as a per-designer level, a per-subsystem level and a full integration level. When a mistake is found, a sanity test (or tests) must be re-run. Lot of checking before time is invested in expensive full regressions. Which is why ML can have an important impact.

VCS DPO

Synopsys VCS® offers a dynamic performance optimization (DPO) option based on both proprietary and ML methods. I don’t know the internal details, but it is interesting that they use other methods in addition to ML. ML is the hot topic these days but it’s not always the most efficient way to get to a good result. Rule-based systems can be more semantically aware and converge quicker to an approximate solution, from which ML can then further optimize. At least that’s my guess.

That said, this is AI/ML so there is a “training” phase and an “application” phase. All packaged for ease of use, no AI skills required by the end user.

Dynamic performance optimization in action

Vish presented analysis comparing the non-AI (base-level) run-time with learning phase and the application phase on the same set of sanity runs. For DPO they used all optimization apps available as a starting point, for example FGP (fine-grained parallelism) with multiple cores. Naturally learning phase runs were slower than the base level, perhaps by ~30%. However, application runs were on average 25% faster, allowing them to do ~30% more of these regressions per day.

Vish stressed that some thought is required to get maximum benefit in these flows since learning takes more time than base runs. He suggested running learning once every few days as the design is evolving, to keep optimizations reasonably current as design and tests change. Learning can run less frequently as the project is nearing signoff since optimum settings shouldn’t be expected to change as often.

A very interesting and practical review. You can learn more from the recorded session. Vish’s talk is one of the early sessions on Day 2.

Also Read:

IBM and HPE Keynotes at Synopsys Verification Day

Reliability Analysis for Mission-Critical IC design

Why Optimizing 3DIC Designs Calls for a New Approach

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.