WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 640
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 640
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)

Applying QED to Hardware Accelerator Verification. Innovation in Verification

Applying QED to Hardware Accelerator Verification. Innovation in Verification
by Bernard Murphy on 06-30-2026 at 6:00 am

Key takeaways

QED (Quick Error Detection) can be a powerful complementary addition to verification but can be subject to size constraints. This month’s paper looks at a fix for that limitation. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and lecturer at Stanford, EE292A) and I continue our series on research ideas. As always, feedback welcome.

Innovation New qed

The Innovation

This month’s pick is Scaling Up Hardware Accelerator Verification using A-QED with Functional Decomposition. The authors are from Stanford, Columbia, Cornell, UCLA, NYU and U Illinois Urbana Champaign. The paper was published in FMCAD 2021 and has 4 citations.

We have posted before on Quick Error Detection (QED) applications, most recently here. QED finds bugs through consistency checking rather than directed testing, a clever idea to expose hidden corner-case misbehaviors. Initial emphasis was on post-silicon validation though more recent research has extended to pre-silicon. I find ~20 papers on the topic suggesting QED continues to spark interest as a component in verification strategies.

Paul’s view

Back to the Quick Error Detection (QED) research track this month. Many excellent papers published here since our previous blog in 2020, all of them building on the premise that functional consistency (FC) checking is a powerful bug finding tool, and well suited to formal methods.  FC is the notion that a function should always produce the same outputs given the same inputs, assuming the same initial state – a notion that applies irrespective of the spec for that function.

In this month’s paper, the authors introduce A-QED2, an evolution of QED targeted at a class of pipelined datapath-centric arithmetic hardware they refer to as loosely coupled accelerators (LCAs). LCAs are functions like AES encoders, speech compressors, image filters (e.g. color-to-greyscale), and AI inference units. The authors observe that LCAs tend to be amenable to decomposition into a graph of sub-functions such that the FC proof can be pushed down into each sub-function, dramatically improving runtime and scalability of the proofs.

The other key contribution of the paper relates to the question of functional correctness relative to a formal spec for an LCA. Here, the authors show that if the LCA’s decomposition can be proved to be both FC and conformant to the spec for a single input, then it must necessarily be functionally correct over all sequences of multiple inputs. This sounds intuitive but the proof is quite involved.

The authors apply A-QED2 to 17 LCA testcases in the 300-600k gate range and compare FC with and without decomposition. Without decomposition, only 2 testcases complete within 12 hours on a Xeon server with 128GB RAM, with decomposition 10 complete in under a few minutes. The authors run on multiple buggy versions of each testcase. Across the 10 testcases that complete there are 90 different bugs to catch. FC finds 49, which is impressive given that it runs in seconds and is blind to the intended functionality of the testcases.

Raúl’s view

A-QED [DAC 2020] uses Functional Consistency (FC): if two identical inputs are applied to the accelerator, the outputs should be identical. It seems almost trivial, but it is very powerful, sort of equivalence checking against yourself, the design is compared with a second execution of itself. The verification engine constructs two symbolic executions and asks can the same input ever produce two different outputs. The core idea of exploiting self-consistency traces back all the way to 1996 [26]. Later work in processor verification, especially SQED (Symbolic Quick Error Detection), developed this idea into a practical formal methodology. A-QED’s contribution is to adapt this philosophy into a practical verification methodology for modern “loosely coupled” hardware accelerators (LCA).

An LCA is a hardware accelerator connected to the SoC through a bus or network-on-chip rather than being tightly integrated into the CPU pipeline.  Non-interfering LCAs are accelerators whose computations are independent of prior computations: given the same input, they always produce the same output regardless of the sequence of operations that preceded it. In other words, the accelerator’s internal state does not influence future computations. This model covers many practical accelerators that implement stateless transformations on streams or batches of data — for example encryption, image filtering, neural-network inference, and signal-processing kernels.

Of course, FC alone cannot prove correctness. A design that always produces the wrong answer is still functionally consistent. This is why A-QED complements FC with Single-Action Correctness (SAC) checks. For each action supported by the accelerator, SAC checks that at least one execution computes the expected output correctly. Unlike FC, SAC requires a reference specification or expected behavior. Together, FC and SAC provide a complete correctness criterion for non-interfering LCAs.

Correct results are not sufficient; accelerators must also respond within a reasonable amount of time. Response Bound (RB) verification checks that an accelerator produces its output within a predefined latency bound. Failure indicates deadlock, livelock, scheduling errors, or other responsiveness bugs.

This paper, Scaling Up Hardware Accelerator Verification using A-QED with Functional Decomposition, addresses scalability by decomposing an accelerator into smaller sub-accelerators that can be verified independently. Interestingly, the decomposition strategy differs for FC, SAC, and RB.

For FC, the accelerator is partitioned into functional blocks, and the same self-consistency check is applied independently to each block. For SAC, engineers must decompose the correctness properties themselves using conventional preconditions, invariants, and postconditions — perhaps the least satisfying aspect of the approach. RB uses a completely different strategy, constructing verification windows over the accelerator’s SSA representation that are analyzed independently.

The effectiveness of A-QED² is demonstrated on 109 buggy versions of 16 hardware accelerators ranging from thousands to over 200 million gates, including NVIDIA’s NVDLA. While conventional bounded model checking frequently timed out or ran out of memory, A-QED² verified the decomposed sub-accelerators in seconds to minutes, detected all bugs found by conventional simulation, and reduced overall verification effort by about 5x on average, reaching 9x for the largest industrial designs. Despite the elegance of the approach and impressive experimental results, A-QED [16 citations] has received surprisingly little attention in the literature.

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.