WP_Term Object
(
    [term_id] => 159
    [name] => Siemens EDA
    [slug] => siemens-eda
    [term_group] => 0
    [term_taxonomy_id] => 159
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 752
    [filter] => raw
    [cat_ID] => 159
    [category_count] => 752
    [category_description] => 
    [cat_name] => Siemens EDA
    [category_nicename] => siemens-eda
    [category_parent] => 157
)
            
Q2FY24TessentAI 800X100
WP_Term Object
(
    [term_id] => 159
    [name] => Siemens EDA
    [slug] => siemens-eda
    [term_group] => 0
    [term_taxonomy_id] => 159
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 752
    [filter] => raw
    [cat_ID] => 159
    [category_count] => 752
    [category_description] => 
    [cat_name] => Siemens EDA
    [category_nicename] => siemens-eda
    [category_parent] => 157
)

Back to Basics in RTL Design Quality

Back to Basics in RTL Design Quality
by Bernard Murphy on 11-03-2021 at 6:00 am

Harry Foster waxes philosophical in a recent white paper from Siemens EDA, in this case on the origins of bugs and the best way to avoid them. Spoiler alert, the answer is not to make them in the first place or at least to flush them out very quickly. I’m not being cynical – that really is the answer though practice often falls short of ideal. Harry suggests we need to get back to basics in RTL design quality, and what better place to start than W. Edwards Deming, a founding father of Total Quality Management.

Back to Basics in RTL
W. Edwards Deming

Quality must be designed in

This seems trite but it’s often the simple mistakes that bite us, like an out-of-range indexing error. Best case they slow down system level testing, worst case they make it through to silicon. It’s easy for us to believe that we are mostly infallible and what few mistakes we make will be caught in verification. But survey after survey proves that trivial mistakes still slip through, because we should know we left the mirage of exhaustive testing behind a long time ago.

Following Deming, we need to design quality in, not try to paste it on in verification. Harry proposes a 3-step process for design, based on a combination of design plus intent. The first step, Produce, starts with producing correct RTL by design (I assume we’re talking here about new IP or subsystems). The argument here is that bugs per line of code (LOC) are more or less constant at 15-50 bugs per thousand LOC irrespective of whether you are creating RTL, C++ or Javascript. The best way to create less bugs is therefore less lines of code, using a higher level of abstraction like SystemC /C++, Chisel or some other domain specific language.

Proving that intent is met

Since the method connects design and intent, the second step aims to prove in design that the intent is met. Harry’s suggestion here is particularly to leverage static and formal verification tools. We are designing quality in, so this is a task for RTL designers. Who already have access to a wide range of apps to simplify this analysis. They can find FSM deadlocks, arithmetic overflow possibilities and potential indexing errors. For possible domain crossing bugs, they can find metastability potential and other domain crossing errors which in many cases cannot be detected at all in simulation. Another possible source of errors is in X optimism and pessimism. The former may at least waste valuable time in system-level verification and the latter can create mismatches between RTL and gate-level sims which even equivalence checking may not find.

Your system verification team will thank you. Or they may curse you if they find problems you could have fixed before you checked in your code.

Protecting intent

The third pillar requires that intent should be protected through the rest of the design lifecycle by continued testing. Harry’s suggestion is to adopt a continuous integration (CI) flow here. We simply reuse the static and formal tests we developed and proved in design. These are largely hands-free and fast tests which should quickly flag checkin mistakes (we all make them).

A final (blogger) thought

This is a worth addition to the canon. We all nod wisely but we still trip up sometimes. With tools like CI we should be able to flush out more of these problems early on.

That said, there are some system-level problems which remain challenging, and which can’t be fixed (I think) at the unit-level. Cache coherence problems, emerging only after billions of cycles are one good example. Power bugs are difficult to cover fully in designs with very complex power and voltage switching. Security problems around speculative execution are another example. It would be great to find some kind of “unit test” methodologies around these system-level “IP”.

You can access the white paper HERE.

Also Read

APR Tool Gets a Speed Boost and Uses Less RAM

DARPA Toolbox Initiative Boosts Design Productivity

Heterogeneous Package Design Challenges for ADAS

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.