WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 638
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 638
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)

Post-Silicon Validating an MMU. Innovation in Verification

Post-Silicon Validating an MMU. Innovation in Verification
by Bernard Murphy on 03-25-2026 at 6:00 am

Key takeaways

Some post-silicon bugs are unavoidable, but we’re getting better at catching them before we ship. Here we look at a method based on a bare-metal exerciser to stress-test the MMU. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and lecturer at Stanford, EE292A) and I continue our series on research ideas. As always, feedback welcome.

Post-Silicon Validating an MMU. Innovation in Verification

The Innovation

This month’s pick is Post Silicon Validation of the MMU. The authors are from IBM and this is an extension to a paper we looked at in 2022. The paper we review here was posted in 2021 at the DATE conference and has 2 citations. The method generates (offline) multi-threaded tests to run on first silicon, requiring only a bare metal interface. This exerciser is a self-contained program with all supporting data and library functions and can run indefinitely using new randomized choices for each set of threads.

The extension here focuses on MMU functions (TLB, page table walks, etc) in a multi-core environment.  In part, concepts are like pre-silicon testing methods but exploit the advantage that randomized post-silicon testing can cover a much larger state space over a much longer time than would be possible in RTL simulation. Similarly, the exerciser can add system level stresses to testing such as context switches and page migration.

Paul’s view

Back to the topic of bare-metal exercisers this month with another paper from IBM on their Threadmill tool. These exercisers are template-based generators of pseudo-random software programs. The programs generated are “bare metal” without any need for an operating system so have the freedom to create a range of low-level hardware race conditions that would otherwise be very hard to hit. The primary application is in post-silicon testing, but there is an increasing trend to use these exercisers in pre-silicon emulation since they often run much faster than classic emulation testbenches.

Our previous blogs covered bare-metal exercisers targeting coherency bugs in multi-core CPUs, deploying various tricks to stress test load/store race conditions between concurrent threads running on multiple cores. This month’s blog is on stress testing virtual to physical address translation in a memory management unit (MMU). A modern MMU has a lot of complexity, including caches for recently accessed address translations, multi-level address translation tables, and security controls. Covering all the corner cases for cache misses, thread context switches to different virtual address spaces, and security policy violations, especially race conditions between combinations of these exceptions, is borderline impossible without doing it at a bare metal level.

One key innovation in the paper is to use constraint solving (exactly as in a commercial logic simulator), to stress test permutations of concurrent address translations between multiple threads, especially including all possible walks through multi-level address translation tables. Another innovation relates to stress testing changes to the translation tables. Here, one thread runs a program that continuously locks a random block of virtual addresses and moves its associated physical address block to another location in physical memory, updating the MMU translation table accordingly. Meanwhile multiple other threads continuously run random load and store operations to those virtual addresses.

The authors implement all their ideas using Threadmill and highlight 3 deep corner case bugs that their solution found that were missed by other in-house IBM exercisers as well as their regular pre-silicon DV work. They also compare RTL code coverage achieved across the same, showing that Threadmill beats the other exercisers by about 4%, although it’s still behind regular pre-silicon DV coverage by 3%. Tight paper, with some nice ideas and clear benefits.

Raúl’s view

This month’s paper, “Post-Silicon Validation of the MMU”, presents a methodology for validating a Memory Management Unit using a bare-metal exerciser (Threadmill). The MMU is not “just another block”: while the core is logic, the MMU is a co-designed distributed HW/SW protocol which makes it disproportionately hard to verify. It sits at the boundary between hardware and the OS (or Hypervisor), translating virtual to physical addresses through multi-level tables and caches (TLBs). This creates a massive combinatorial space, with aliasing, contexts, and shared resources across cores. The key idea is to significantly enrich the generation of address translation scenario beyond simple VA to PA mappings, toward randomized, constraint-driven, and context-aware translations.

The approach includes off-target generation of translation mappings (page tables and paths) using constraint solvers, graph-based constraint solving to create diverse translation paths, complex runtime behaviors (page migration, context switching, TLB invalidations) and embedded exception handlers. The result is a system that better stresses corner cases in MMU behavior, especially those involving concurrency, aliasing, and rare timing interactions — areas where pre-silicon verification is weakest.

The paper reports RTL coverage as the primary quantitative metric, highlighting a ~4% improvement over a state-of-the-art exerciser. RTL coverage is primarily pre-silicon sign-off metric and is used here as a proxy metric for comparability. In post-silicon validation the real interest is bugs, especially the rare, high-impact ones. Coverage tells us how much of the map was explored; it does not tell us if we found what we were looking for. The paper does list non-trivial bugs found, including ones that led to additional tape-outs; the 4% is unlikely to be “more of the same coverage” but rather harder to reach corner cases.

I found the paper hard to read. Many of the innovations over a baseline Threadmill-style exerciser are described almost entirely in dense prose, for example the translation engine (GCSP over DAGs) and runtime scenarios like page migration, context switching, and attribute perturbations. The paper also leans heavily on prior work like papers, patents, and internal techniques. For experts, this may be familiar, for broader audiences it reduces accessibility.

The paper is a strong, experience driven contribution to post-silicon verification reflecting a rich, mature body of industrial knowledge, particularly relevant for teams dealing with MMU verification. While the presentation is dense and occasionally hard to read, the underlying ideas remain highly relevant. It is methodologically aligned with current practice (randomization, stress, coverage), but does not yet reflect newer paradigms (e.g., ML-guided test generation, agentic flows, feedback-driven exploration). The reported gains may look incremental, but in the context of late-stage silicon validation, they can be the difference between a clean product launch and an expensive respin.

Also Read:

An Agentic Formal Verifier. Innovation in Verification

Agentic EDA Panel Review Suggests Promise and Near-Term Guidance

TSMC and Cadence Strengthen Partnership to Enable Next-Generation AI and HPC Silicon

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.