I’m kicking off a blog series which should appeal to many of us in functional verification. Paul Cunningham (GM of the Verification Group at Cadence), Jim Hogan (angel investor and board member extraordinaire) and I (sometime blogger) like to noodle from time to time on papers and other verification articles which inspire us.
We want to support and appreciate innovation in this area so we’re taking our noodlings public. Please let us know what you think and please send any suggestions on papers or articles for us to discuss in future blogs. Ideas must be published in a peer-reviewed forum, generally available to all readers (or through subscription to IEEE or ACM).
We’ll start with “Optimizing Random Test Constraints Using Machine Learning Algorithms” by Stan Sokorac at Arm. This won a best paper award at DVCon a couple of years ago.
Verification depends heavily on generating pseudo random sequences. Beyond easy tests, we poke around semi-randomly, aiming for a lucky find here and there to push coverage higher. It’s intuitive to believe that machine learning could improve this, helping find bugs faster or find more bugs. So this paper is immediately eye-catching.
Stan first defines a new type of coverage to isolate rare states, based on toggle-pair states where two flops toggle close (in time) to each other. He reasons that such events should be an indicator to tests which could be tweaked to improve coverage. This metric is used in test selection in the following steps.
His first learning method uses a genetic algorithm to evolve the test mix between generations, mutating versions of previous generation tests with a bias to those that hit rare toggle-pair states. Mutation depends on random test constraints being parameterized through command-line options. Per pass, mutation tweaks these options and starts from a new random seed. Stan reports that this method alone significantly improves coverage, using less tests.
Another approach uses unsupervised learning together with the previous method, aiming to avoid biases in large designs in which mutation alone may drive convergence only in a subset of these areas.
I think of Stan’s “toggle pair” coverage as adding an extra dimension to traditional flop coverage – not unlike looking at branch coverage vs. line coverage in software programming. The more dimensionality there is to a coverage metric the richer it is, while the harder it is to cover the space.
The toggle pair metric highlights non-trivial bugs related to close (time) proximity events. That’s an important class, however it doesn’t cover events, also important, where cause and effect are separated by many clock cycles.
The ML method Stan uses is a combination of a genetic algorithm and k-means clustering. Stan suggests and I would like to see the work extended to leverage neural networks, especially since the “k” value for the number of clusters is not automated in this solution.
Another challenge for ML is what knobs/parameters to control. The Arm testbench has 150 command-line options controlling testbench behavior. Stan’s genetic algorithm mutates these knob settings to configure tests for high coverage in a small number of tests. Very cool, but would it work as well if the testbench didn’t have that many options?
I really enjoyed reading this paper and the results are very compelling, given that Stan is running on production testbenches used to verify production CPUs at Arm. His ML optimized regression doubles the number of failing tests (increasing bugs found) in 10X less simulation cycles than the standard flow.
I’m looking at this as an investor. Is there enough market demand to draw seed funding or even full-round funding to an early-stage company, or to prompt a strategic investor to buy that company once the value proposition is reasonably proven? The verification problem is definitely growing more complex and the market is growing at double digits. So that’s a good start.
I remember a company I worked with many years ago, which triggers some questions for me. They had a great technology, but it wasn’t really a product. You could imagine seed funding followed by a quick acquisition, but it wouldn’t get to a full round. I see the same thing here.
A second caution (same example) was the level of expertise required to use the tech. In that case it was advanced DFT and formal proving (still expert-only in those days). This restricted usage to PhD types. Might this have similar problems, for example in biases in AI training?
My third caution would be market timing. Introducing a great solution at the wrong time is just as bad as having a terrible solution. Is this solution going to depend on another technology to be introduced or mature (perhaps PSS)? Will it take off only when happens? If so, better to continue to evolve the solution in-house until timing is better.
Paul talked about dependency on mutating command line options. Contrast this with diddling with constraint parameters where noise can be higher than signal in trying to extract trends for coverage. Command-line options shed a lot of that noise because there should be more design intent implicit in the options, at least for this example.
I think Stan is scratching at the surface of something important here – maybe there’s a more systematic yet still high-level control point for ML plus randomization. PSS is one platform on which this might evolve. Would be interesting to see application at that level.