Two recently announced vulnerabilities in major processor platforms should remind us that bugs don’t organize themselves to appear only in domains we know how to test comprehensively. Both Meltdown and Spectre (the announced problems) are potential hardware system-level issues allowed by interactions between speculative execution and cache behavior under specialized circumstances. Finding hardware weaknesses among highly complex interactions is where formal-proving excels, but common belief is that formal analysis on hardware systems of this complexity is beyond the reach of today’s tools, which are typically bounded to block/IP-level proving.
Or so goes conventional wisdom. But not knowing how to test for complex issues doesn’t make them go away, so experienced teams look for creative approaches to formally verify at the system level. Qualcomm (Mandar Munishwar, Sr. Staff Eng. at Qualcomm/Atheros, who also teaches a class at UCSC on system-level assertions and formal) and Vigyan Singhal (CEO of Oski) co-presented on this topic at Oski’s most recent Decoding Formal event.
Mandar kicked off by showing a real example where he was looking for potential system-level deadlocks in a wireless PHY layer sub-system, a controller for a next-gen WiFi core. The central block in this system sequences operations of ten immediately surrounding blocks, together acting as 11 complex intercommunication state-machines. The total system is deep inside the design, therefore difficult to control/observe directly, and the complexity of interactions with other blocks is sufficiently high that deadlocks are possible. All of which fits my earlier characterization – conceptually a perfect for formal, but far too big.
When a design is too big for formal, the logical next step is to abstract. You replace memories or datapath elements (for example) in the design with reduced models which exhibit the important features of the replaced component for the purposes of the proof. This works well when just a handful of components stand between you and a feasible proof, but what do you do if all the components are too big? At this point you have to start thinking about architectural formal verification, where you abstract everything in the design.
Mandar explained that each block in this proof, including the central block, is abstracted down to an appropriate FSM. In essence, none of the original RTL remains except for the top-level connectivity, much of which may be ignored as not relevant to the deadlock verification. At this point, some of you will object (in the strongest possible terms). What are you verifying if hardly any of the original RTL remains? There were some attendees who seemed to be wrestling with that objection; I also have had the same concern, but listening to a previous talk on using this method for cache-coherence verification and this talk where abstraction was taken to the limit, along with Vigyan’s explanations both times, I believe I have come to terms with the value and the validity of the method.
This starts with the process. Since what you are verifying is not the design RTL, you need strong links between that RTL and the proof and also, incidentally, the original architectural spec. That last part is a nice plus in these proofs. You derive architectural models as simple FSMs for each block, starting in each case with the architectural spec for the block rather than the RTL. In the abstracted model, you preserve only those details considered relevant to the proof. In this case, Mandar was primarily concerned with handshakes passing between the blocks because he was looking for deadlocks, but he still allowed for timing variability, eg non-deterministic latencies.
The top-level design remains the same with architectural models (AMs) replacing the blocks under that top, and block signals irrelevant to the proof are stubbed or left open as appropriate. Also adjustments are made to time-frames to scale multi-microsecond transactions to ~200 cycles for reasonable proof times. To give us a sense of the complexity of sequencing through these intercommunicating FSMs, Mandar showed an example control flow. Proofs are run on this top-level architectural model to find potential deadlocks.
Finally, the AMs are validated against the corresponding RTL. This can be done in simulation or formally. Per Vigyan, the simulation approach works well for relatively simple AMs, where obviously all you want to look at is the handshaking behavior. Formal is used for the more complex AMs, where you validate between the RTL and the assumptions (constraints) made when constructing the AM.
What did Qualcomm get out of this? First, they found bugs in the architectural specs – try doing that with any kind of RTL verification. This is a human step, not automated, but simply the exercise of generating an AM exposes potential problems. Pretty valuable. Then system-level verification exposed bugs. And finally, in AM versus RTL verification they found corner case bugs in the RTL. Altogether they uncovered 9 hard-to-find bugs which could easily have gone through to production or at least a silicon re-spin.
I can understand why verification teams might feel queasy with this approach; what you are proving feels multiple steps removed from what you are building. Vigyan himself told me this approach isn’t for everyone. And yet problems at this level happen and can be difficult even to conceive, much less figure out how to test. Just ask the processor guys scrambling to deal with Meltdown and Spectre. I can’t tell if the Qualcomm approach is necessarily the best way to catch Meltdown/Spectre-type problems, but I do believe formal needs to play a bigger role at this level and it has to work with some kind of architectural spec level because proofs based directly on the RTL are impractical. You can check out Oski’s Decoding Formalvideos HERE.