Multicore systems working with shared memory must support a well-defined model for consistency of thread accesses to that memory. There are multiple possible consistency models. Can a design team run memory consistency checks at RTL? Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO) and I continue our series on research ideas. As always, feedback welcome.
This month’s pick is RTLCheck: Verifying the Memory Consistency of RTL Designs. The paper was published in the 2017 IEEE/ACM MICRO. The authors are from Princeton and NVIDIA.
Memory consistency is a contract between hardware and software developers on ordering of reads and writes in a multicore system. If two or more threads can load from or store to a logical memory location, given no necessary synchronization between cores and optimizations like out-of-order execution, some orders may be allowed and some may not. There are multiple possible ways to define such rules. An agreed set of rules bounding this behavior defines a memory consistency model (MCM). RTLCheck is the authors’ contribution to automatically check that an RTL design complies with a set of (modified) user-defined axioms encoding an MCM.
The method detects axiom violations as cycles in “happens before” (hb) graphs of test cases, elaborated as fetch-decode/execute-writeback operations across cores. One such litmus test checks message passing between cores. Since arcs in an hb graph denote permitted orderings of operations, a cycle in an hb graph implies a sequence of operations that must complete before they start, which is not possible. The axioms used to prove to an MCM work with abstract architecture specifications. Temporal proof engines used in formal methods for RTL lack this flexibility (per the authors) so axioms are “synthesized” to industry standard System Verilog Assertions (SVA) and constraints with some limitations on what can be mapped.
Verifying memory consistency in multi-processor (MP) systems is hard, and I’m always a fan of raising abstraction levels as a important way to tackle hard verification problems. The paper’s basic premise to compile high level micro-architecture MCM axioms into SVA is a great idea, and the “happens before” graphs used in these axioms are an elegant and intuitive way to express MCM intent.
The paper is thorough and it’s always nice to see joint research between academia and industry. The authors clearly describe their approach and provide a full worked example on an open-source RISC-V core in which they found a real bug using their methods. Although, as the authors point out, the bug is a bug even for a single instance of the RISC-V core. The memory’s “ready” signal has been accidentally tied to high so the memory is always ready to accept a new value.
I do find myself wondering how much simpler the author’s axiomatic specifications in “Check” tool format are than their synthesized industry standard SVA equivalents. The mapping is 1-1, just more verbose in SVA format. For example, one key observation in the paper is that an SVA for “A implies B happens later” (A |-> ##[1:$] B) can match the case where A happens multiple times before B where an hb-graph axiomatic equivalent would not – imagine that A is “store value x” and B is “load value x”. An intervening “store value y” would obviously invalidate the axiom. Synthesizing additional SVA syntax to prevent multiple A’s before B is one of the paper’s contributions (A |-> ##[0:$]!A ##1 B), but this contribution feels more like syntactic sugaring than a fundamental raising of the abstraction level.
Overall, tight paper, well written, and on an important topic. Also, nice to see the author’s using Cadence’s Jasper formal tool to prove their synthesized SVAs. And find that RISC-V bug 😃
For the interested reader, the paper explains axiomatic microarchitectural models in 𝜇spec (first-order logic). Together with the corresponding 𝜇hb (happens-before) graphs and temporal assertions. They walk through a small motivating example with two cores . Each runs a small common “message passing” litmus test of 2 instructions in each core which is easy to follow. The actual generation of the temporal assertions is quite complex and involves additional user non-automatic steps:
- mapping the litmus test program instructions and values to RTL expressions
- mapping 𝜇hb graph nodes to Verilog expressions
This requires a “user” with deep knowledge of the field, i.e. axiomatic specifications, 𝜇spec, SVA, RTL, etc. Designers of parallel architectures working with verification experts and access to JasperGold and the check suite  can potentially profit from using RTLCheck.
The results are nice: For a 4-core RISC-V, 89% of the generated assertions for 56 litmus tests are completely proven. The remaining 11% complete with bounded proofs. These include the discovery of a real bug in the Multi-V processor and its fix.
As an academic research paper, I find the claims and results stand up, and the concept to be very interesting. However, it is hard to see commercial opportunity in productizing this work. The very high and necessary level of cross-domain expertise and what appears to be a significant level of manual effort does not seem scalable to production applications.
I had hoped this would be a neat lateral way to verify coherency. But Paul and Raúl talked me out of it. The expertise and effort required to setup axioms and constraints to manage formal analysis on a modern SoC seems daunting. I hold out hope that the core concept may still have value. Perhaps when applied (in some manner) in a simulation application.
Also ReadShare this post via: