Can we detect bugs in post- and pre-silicon testing where we can drastically reduce latency between root-cause and effect? Quick error detection can. Paul Cunningham (GM, Verification at Cadence), Jim Hogan and I continue our series on novel research ideas. Feel free to comment.
This month’s pick is Logic Bug Detection and Localization Using Symbolic Quick Error Detection. This paper was published in arXiv in 2017. The authors are from Stanford University.
The method originated in post-silicon debug where options for testing are constrained. Think of this as “relentless consistency checking”. Start with machine-level code and regularly duplicate instructions reading and writing through a parallel set of registers / memory locations. Regularly compare original values with the duplicated values. A difference detects an error. Why wouldn’t these checks always pass? Perhaps because register or memory contents can be corrupted in complex SoCs thanks to coherency or thread management problems.
Most striking, sometimes problems observed in an application are found millions of cycles after the root-cause, making debug very difficult. QED checks can catch such an issue within a smaller window from the root-cause. The authors describe a coherency example they caught within a few cycles which did not trigger an observable bug until millions of cycles later.
QED is useful in pre-silicon testing as well as post silicon. and will localize a bug within a window of time and instruction trace on a CPU. QED will not localize a bug outside a CPU, but symbolic QED will and also produces a much shorter trace – in some sense a formally minimal trace. (It will also localize bugs inside a CPU.) The authors show QED traces much smaller than original observable failures, which run up to 5M instructions in their tests. Yet QED traces are still long and still don’t fully localize. Symbolic QED traces are <20 instructions, in a different league altogether and completely localize the bug.
Details for mapping from the QED trace to start symbolic QED analysis are complex, requiring a combination of simulation and manual search to find an appropriate initial state.
I’m new to QED and I find it very interesting. Long latency bugs take huge effort to track down. Anything that can reduce that latency from millions of cycles to just a few cycles or less is amazing. The paper itself is very well written, very thorough as you’d expect from work coming from Stanford. When used for pre-silicon verification, I see QED in a similar bucket to mutation and concolic techniques where you are automating bug discovery vs. having to put manual effort into additional testcases and correctness properties. But QED also applies to post-silicon debug, where the other methods do not.
I really like that Symbolic QED is based on a formal search with a bounded model checker so can be used to find a truly minimal trace to activate the bug. Their method is also able to find a minimal subset of the SOC to activate that bug. The paper does a beautiful job of explaining this. It’s a wonderful symbiosis between simulation and formal. Find some nasty real-world failure that could have led to millions of cycles between bug activation and observable error. Use QED to localize the activation point of the bug. Then use symbolic QED to generate a minimal trace and minimal design subset that reproduces the bug.
Where I think symbolic QED would benefit from more research is in simplifying the initial state step. Currently this starts with automation, yet that step still only narrows the search from millions of instructions to thousands of instructions, still too big for model checking. The authors describe a manual search beyond that point to get down to around 100 instructions, at that point manageable in model checking. More automation would be a big advance.
That point aside, I’m excited by the strength of the results. The fact that they have a very credible testcase with an 8-core openSPARC CPU with shared L2 cache and 4 memory controllers, and they pick 92 real world bugs from the literature. All of which they localize with QED + Symbolic QED to minimal traces. Very impressive.
Post-silicon caught my eye. We talk about first-pass silicon but that’s not been a reality for a while. We still need post silicon debug and this is a cool approach to do a first-pass localization. I also like the long latency bug aspect. Customers pay for solutions that solve hard problems and these don’t sound like they’re going to be occasional problems.
I like seeing another way that formal can be leveraged, once the window has been narrowed down enough. To do this requires building on a proven technology, better yet it doesn’t require formal experts.
Closing the gap to a commercial product looks like it needs work on the initial state. Which doesn’t seem like hard-core innovation. Techniques the authors used look like routine engineering methods. I’m sure given some hard work that gap could be closed. Investable.
Good stuff. Just for the sake of argument, what you might miss in such an approach? The core of the test uses the design and code itself, running on CPUs, as the oracle. If a bug, say in an AI accelerator, doesn’t manifest as a QED difference, that might not be caught.
Click HERE to see the previous Innovation blog