WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 494
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 494
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)
            
14173 SemiWiki Banner 800x1001
WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 494
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 494
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)

An ISA-like Accelerator Abstraction. Innovation in Verification

An ISA-like Accelerator Abstraction. Innovation in Verification
by Bernard Murphy on 09-29-2021 at 6:00 am

A processor ISA provides an abstraction against which to verify an implementation. We look here at a paper extending this concept to accelerators, for verification of how these interact with processors and software. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO) and I continue our series on research ideas. As always, feedback welcome.

An ISA-like Accelerator Abstraction

The Innovation

This month’s pick is Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification. The paper was published in the 2018 ACM DAES. The authors are from Princeton, IIT Kanpur and Technion Israel.

The authors aim with ILA to extend the ISA (Instruction Set Architecture) concept for processors to include accelerators common in most SoCs, providing a unified abstraction for all software-visible compute units in such SOCs. Accelerators are already visible to software for example through memory-mapped IO; ILA adds instruction abstractions to this view. Their goal is to enable scalable software validation against an abstract model of the SoC and a standard abstraction against which implementations can be verified. Generally they aim to reduce verification complexity as architectures expand further in massively multi-core and multi-die systems.

They consider applications to accelerators for image processing, machine learning and cryptography and a RISC-V core. ILAs here are generated either by hand or through a template-based synthesis flow. ILA allows for hierarchical levels in abstraction, to model microcode level complexities like looping and implementation choices like buffering for streaming IO.

The authors use formal methods based on commercial and open-source tools to verify their ILAs. They do this through equivalence checking to compare two different implementations of an ILA (e.g. a specification ILA versus a more elaborated ILA). They also run equivalence checks between ILAs and RTL implementations.

Paul’s view

This is a very thought-provoking paper. The trend towards domain-specific heterogenous compute is unstoppable, fueled in a big part by the AI/ML transformation ongoing all around us.

Traditional CPU design benefits from a rich ecosystem of verification and validation techniques built around formal definitions of ISAs. These ISAs serve as a bridge between the software validation world and hardware verification world. Software is compiled into a sequence of ISA instructions. Hardware processes that ISA sequence in silicon.

In this paper the authors attempt to generalize an ISA beyond general purpose CPUs to any domain specific accelerator. They observe that while accelerators do not literally support an ISA, it is possible to map an accelerator’s MMIO or stream interface to something that looks and feels like an ISA. A key contribution of the paper is a formal definition of this “ILA” generalization such that no matter what concurrency there is in hardware between the CPU and its accelerators, the software world can see a flattened sequential sequence of unified ILA instructions across all compute engines.

The paper is very practical, taking the reader through multiple worked examples of ILAs for accelerators (gaussian blur, recurrent neural network, AES encryption), along with proofs of correctness for RTL implementations of these accelerators to their respective ILAs – nice to see one of the tools they use being JasperGold from Cadence 🙂

I would love to see follow-on papers to showcase worked examples of the other side of ILAs. How they can be used to improve validation of software leveraging multiple accelerators and CPUs. While the way an ISA bridges software and hardware for a CPU is clear, a software program that leverages accelerators does not literally compile into ILA instructions, so the bridge between hardware and software for accelerators using ILAs is less clear to me, and I would appreciate a worked example.

Raúl’s view

The paper is a significant contribution around modeling, not really about innovation in verification, as Paul noted. Verification is based on existing verification tools, the contribution here is methodology and modeling. I think it is fair to note that the human effort involved in the setup (including setup for proofs) seems more significant than run-time and probably requires a high level of expertise. The authors hand-generated some of the ILA models, not so surprising at this stage of evolution. Some they generated through template-based synthesis, checking against reference models (C/C++, SystemC, Chisel, RTL). Though they don’t go into details, I suspect there is significant manual work behind the scenes in finalizing those models. Even the formal verification setup probably requires a lot more know-how and work than detailed in the paper.

In terms of how EDA tools could use ILA in the future, that depends obviously on a usable language / notation. We’d have to look carefully at comparable objectives and adoption rates. Chisel for example is another way to abstract, also using templates; Berkeley used Chisel to build the Risc-V Rocket and Google used it to design an edge TPU. SystemC might be a more interesting reference as a high-level standard which has faced adoption challenges.

This looks like a starting point on a long road, pointed to by future work suggestions. Extensions to concurrency, consistency of shared memory, accelerator code generation and reliable simulator generation for software development are some examples.

My view

While this paper doesn’t demonstrate an application. I did find a related paper, also from Princeton, on application to memory consistency verification. Perhaps we can review that paper in a later blog. I see this paper more as an introduction to the concept. With a demonstration applied to a range of accelerators, rather than a self-contained innovation. Some innovations build on multiple sub-innovations. This is a sub-innovation 😀

 

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.