WP_Term Object
(
    [term_id] => 14
    [name] => Synopsys
    [slug] => synopsys
    [term_group] => 0
    [term_taxonomy_id] => 14
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 694
    [filter] => raw
    [cat_ID] => 14
    [category_count] => 694
    [category_description] => 
    [cat_name] => Synopsys
    [category_nicename] => synopsys
    [category_parent] => 157
)
            
800x100 Efficient and Robust Memory Verification
WP_Term Object
(
    [term_id] => 14
    [name] => Synopsys
    [slug] => synopsys
    [term_group] => 0
    [term_taxonomy_id] => 14
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 694
    [filter] => raw
    [cat_ID] => 14
    [category_count] => 694
    [category_description] => 
    [cat_name] => Synopsys
    [category_nicename] => synopsys
    [category_parent] => 157
)

Qualcomm Insights into Unreachability Analysis

Qualcomm Insights into Unreachability Analysis
by Bernard Murphy on 10-17-2023 at 6:00 am

Unreachability (UNR) analysis, finding and definitively proving that certain states in a design cannot possibly be covered in testing, should be a wildly popular component in all verification plans. When the coverage needle stubbornly refuses to move, where should you focus testing creativity while avoiding provably untestable logic? Seems like a no-brainer to figure this out if you want to reduce effort and schedule, yet UNR still is not as mainstream as you might imagine. A talk by Luv Sampat (Senior Engineer at Qualcomm) showed where the simple UNR premise falls short and shared a path forward at a recent Synopsys VC Formal SIG event.

Unreachability

Context

Unreachability analysis is based on formal technologies, as usual best applied to IP-level tasks in this case to Qualcomm Hexagon DSP cores. In different configurations these cores are used in products from Bluetooth earbuds all the way up to datacenter AI platforms. Coverage analysis requires UNR application in setup, run, and evaluation to scale effectively across that range.

Qualcomm assesses progress to closure through toggle, line, condition, and FSM coverage. Luv said that on one of their larger configurations they have over 200 million coverage goals. That is important not only because checking that many goals is a huge task but also because the number of claimed unreachable goals determined in analysis may also run to millions. Any given claim may result from an over-constraint or an unsupported use-case; only a designer can decide between these options and a true unreachable state.

Bounded proofs compound the problem. If a state was unreachable within proof bounds, can the bounds be increased to increase confidence? Taken together, these challenges are familiar enough in formal property checking but here potentially millions of claims may need manual review by design experts, an impractical expectation only manageable through engineer-defined blanket exceptions which undermine the integrity of the analysis. Worse yet, exceptions may not be portable to other configurations, or even between successive RTL drops for one configuration.

What about divide-and-conquer?

If you know your way around property checking this may still not seem like a real problem. There are multiple techniques to divide a big problem into smaller sub-problems – black-boxing, case analysis, assume-guarantee, etc. Why not use one or more of those methods?

The first and most obvious problem is that UNR is supposed to be a transparent complement to simulation coverage analysis. It should just automatically adjust simulation coverage metrics, so you don’t have to worry about what is not reachable. Requiring partitioning, setup, run and reconciliation through divide-and-conquer analyses is hardly transparent. Second it is unclear how you would divide and then recombine results for say a toggle coverage analysis without understanding if coverage for the sum of the parts really adds up to coverage for the whole. Third, even if such a method could be automated, would it be easily portable to other design configurations? Probably not.

Divide-and-conquer must be a part of a practical solution to conquer scaling, but not through standard methods. Luv/Qualcomm have been working together with the Synopsys VC Formal group to drive a better solution in their FCA (unreachability analysis) app.

Auto-Scale

The method is called Auto-Scale. Consider toggle coverage where you really need to look at the whole design, but the formal model for a large IP is too big. Instead of building this full model, break each coverage metric into sub-tasks and for each build a formal model only around the cone of influence for that sub-task, dramatically reducing the size of a proof without sacrificing integrity. In effect this method handles sub-task partitioning automatically but in a way that preserves proof integrity.

To optimize throughput with completeness, Luv talked about flexibility in a grid spec per metric, also a “memory ladder” allowing you to specify a starting memory requirement per sub-task while allowing that allocation to progressively ramp up in retries for a task which hits a bound before completing the proof. Contrast that with the standard approach where you would need to reserve the maximum memory available at the outset, wasting much of that allocation on quick proofs and maybe still limiting bounds on tough proofs.

Results are impressive. In one example (5-10 million goals), a standard approach to UNR required 6 partitions, 256GB of memory, and left 800k goals uncovered. The Auto-Scale version required no partitioning, ran in under 100GB and left only 500k goals uncovered. Further, line coverage improved from 96% to 99%, condition coverage from 77% to 78%, toggle coverage from 88% to 89% and FSM coverage from 55% to 99%. This last improvement Luv attributes to being able to see all the FSM logic in one-shot rather than needing to split the FSM into multiple runs.

In a larger test with over 50 million goals, the standard approach required 26 partitions, many hundreds of child processes and terabytes of memory. Even though they were able to complete the task in principle, the engineering team rejected these results which they considered too noisy. The Auto-Scale approach required only 4 partitions, ran between 100-500GB of memory and left only 900k uncovered goals (versus 2.4 million for the traditional analysis). Line coverage for both approaches came out at 94%, condition coverage climbed a little from 62% to 65% and toggle coverage jumped from 70% to 93%. The engineering team were happy to use these results 😊

Auto-Scale coverage improvements and significant reduction of uncovered items make a big difference to the effort required of the dynamic verification team to close coverage: net 12% reduction in coverage goals in the first example and 9% reduction in coverage goals in the second. This looks like a real step forward to extend the value of unreachability analysis to larger IPs.

You can watch the presentation yourself HERE.

 

 

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.