WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 594
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 594
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)
            
14173 SemiWiki Banner 800x1001
WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 594
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 594
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)

Theorem Proving for Multipliers. Innovation in Verification

Theorem Proving for Multipliers. Innovation in Verification
by Bernard Murphy on 07-31-2024 at 6:00 am

An explosion in multiplier types/combinations lacking well-established C reference models for equivalence checking is prompting a closer look at theorem proving methods for verification. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and now Silvaco CTO) and I continue our series on research ideas. As always, feedback welcome.

Theorem Proving for Multipliers

The Innovation

This month’s pick is Sound and Automated Verification of Real-World RTL Multipliers. This article was reported in the 2021 FMCAD conference. The authors are from UT Austin.

Multipliers are cropping up everywhere: in AI for MACs (multiply-accumulate) and dot products, in elliptic curve cryptography, and in countless applications in signal processing (filters and FFTs for example). What is distinctive about these multipliers is size (up to 1024×1024 in some instances) and novel architectural complexity. For example, a naïve architecture for a MAC (a multiply function followed by an add function) is too slow and too large for state-of-the-art designs; current implementations fuse multiply and add functions for faster and smaller implementations.

While these new approaches are faster and more area efficient, they are correspondingly more challenging to fully verify. Equivalence checking is the verification method of choice but depends on well-established reference models for comparison. The authors propose a formal approach based on the ACL2 theorem prover to soundly verify correctness in very respectable run times. FYI ACL2 is in the same general class of proof assistants as Coq, Isabelle, Lean and others.

Paul’s view

Switching back from AI to formal verification this month. Most commercial formal tools use SAT and BDD, but there is a niche market for alternative “theorem prover” based tools, especially to verify complex arithmetic circuits. Theorem provers convert a circuit into an equation by walking through it from left to right, building an equation for each gate output based on the equations for its inputs (a technique known as “symbolic simulation”). The resulting circuit equation is then iteratively transformed through a series of rewrite rules (literally like a human would perform a proof) until, hopefully, it is identical to the expected equation for the overall circuit.

Theorem provers are susceptible to equation size explosion. Multiplier circuits, especially complex MAC and dot-product circuits with truncation and right-shifted outputs, as might be needed for an AI accelerator, are especially problematic. This paper proposes a new set of rewrite rules that can prove correctness on several such complex circuits in a matter of a few minutes where prior methods explode and fail to complete.

The authors’ key idea is to represent all the circuit equations in a carry-save-like notation by defining two functions, a “c” carry function c(x) = ROUND_DOWN(x/2)  and a “s” sum function s(x) = MOD_BASE2(x). The propagation equation for gates is reworked to using these two functions as much as possible. So for example, for a full adder with inputs a and b, the output equations would normally be c_out = a*b, and s_out = a+b – a*b. In c-s notation they would be c_out = c(a+b) and s_out = s(a+b).

The authors then define various c-s rewrite rules, e.g. c(s(x)+y) = c(x+y) – c(x) which are applied continuously as the circuit is being walked. The combination of the c-s notation and these rewrite rules has a transformational effect on the complexity of the equation that is built. Really nice paper, well written – the idea makes such good intuitive sense that I can’t help but wonder how it has not been tried before! Will definitely be following up with the Jasper formal team here at Cadence.

Raúl’s view

Temel and Hunt take us on a grand tour of designing and verifying multipliers. They explore different multiplier architectures, such as standalone multipliers with full, truncated or right shifted outputs, integrated multipliers, MACs, dot-products, various encodings, and so on. The paper shows a sample design of multiplier modules that have multiple uses. They present results that include the following architecture elements:

  • Partial product generation algorithms, such as simple partial products, Booth encoding radix-4, or radix-2.
  • Summation tree reduction algorithms such as counter-based Wallace, array, Dadda, traditional Wallace, overturned-stairs, balanced delay, redundant binary addition, 4-to-2 compressor and 7-to-3 compressor trees, and merged multipliers with Dadda tree.
  • For final stage addition, these multipliers implement Kogge-Stone, ripple-carry, Brent-Kung, Han-Carlson, Ladner-Fischer, carry-select, conditional sum, variable-length carry-skip, block carry-lookahead. and regular carry-lookahead adders

The main section of the paper is on verification. It builds on previous work on a term rewriting algorithm that can verify a wide range of isolated multiplier designs. It extends this to cover the multipliers described above and to return counterexamples for buggy designs. It is based on a term rewriting (replacing terms according to a set of rules) technique the authors call s-c term rewriting, where the s stands for sum and the c stands for carry. They use ACL2, an interactive and automated theorem proving system.

Results are reported for over 60 isolated multipliers, other configurations and MACs from 16 to 1024 bits. They all complete in times that range for fractions of a second to 300 seconds for a 1024×1024 multiplier and 356 seconds of a 256×256 MAC and compare favorably to other state-of-the-art work (which is orders of magnitude slower and times out at 90 minutes for several designs).

It takes some time to read the paper, it is thoroughly researched but not totally self-contained. The 37 references cover standards for multiplier designs (e.g. Brent-Kung, Wallace…) and the state-of-the-art of formal verification (BDDs, BMDs, SAT, SMT Computer Algebra Methods). The reader who is not so interested in this detail can focus on the results and conclusions which are impressive and advance the state of the art.

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.