As SoC design size and complexity increases, simulation alone falls farther and farther behind, even with massive cloud farms of compute resources. Hardware acceleration of simulation is becoming a must-have for many teams, but means more than just providing emulation hardware.
We often hear how good hardware acceleration is in terms of achieved speed up. I once worked for a boss who insisted that we jump to the last slide of a presentation first. His idea was that if the conclusion was strong enough, and we could summarize it succinctly, all was good. If it was boring, he wouldn’t waste time with it. If it was somewhat controversial, but fascinating enough, he’d let the rest of the presentation proceed to bring everyone up to speed and give a chance for supporting or detracting opinions. (We quickly learned to build an opening slide meeting his executive summarization criteria.)
A new Aldec white paper steps through a case study of hardware acceleration of a testbench for a 6×6 network-on-chip (NoC) mesh design. The result is a 171x speed up versus simulation. Now that I have your attention ….
Getting that kind of improvement requires understanding the architecture of the design under test, how UVM testbenches work, and what needs to be done to leverage SCE-MI acceleration. Aldec’s Piotr Bajorowicz walks through all these pieces in this case study. We’d like you to read the entire paper – here are some highlights.
A notional NoC design was borrowed from research performed at Stanford University, a simple scalable mesh. Each router in the NoC has a test module called a packet interpreter (shown in green), and the UVM testbench reaches in and grabs the packet interpreters at several critical intersections via transactors (shown in red) to provide stimulus.
Bajorowicz goes through some the internals of the packet interpreters and how the routers work. He then dives into the transactor blocks, verification IP supporting the SCE-MI interfacing. In this case, the router transactors use pipes to source and sink packets for test traffic. Functions were created to provide coverage analysis, and then a function-based API is created in DPI-C to complete the interface to the simulator.
Then comes the UVM testbench itself. Virtual sequences deliver randomized traffic, and results are collected in a scoreboard and compared to a reference model checking correctness and coverage. Constrained randomization is repeated until 100% coverage is achieved. In this case, speed-up results actually improve with more packets.
That is not always the case, and it has to do with the ratio of simulation events versus the number and size of data transfers. The idea is to make sure the simulator is not waiting around for event responses and isn’t overwhelmed by how much high-level data it has to generate for the SCE-MI transactors on each event. Bajorowicz simplifies it further:
… if there is a lot of data transmitted between hardware and software while the design does not perform complex operations, it can be hard to achieve a high acceleration ratio.
It is clear that high acceleration ratios are possible for many designs. Aldec’s case study makes use of their Riviera-PRO simulator and HES-DVM FPGA-based prototyping environment for UVM acceleration. The design under test was fairly simple and illustrates a profile that benefits from hardware acceleration. In Aldec’s view of the ASIC Verification Spectrum, preliminary steps in ASIC design include a requirements cycle with Spec-TRACER and a static analysis phase with ALINT-PRO, eliminating many bugs before simulation starts.
Aldec teams will be demonstrating the verification of this NoC using their approach at DVCon Europe next week. More details and a link to the complete white paper are here:
Rather than an abstraction of the SCE-MI process and some handwaving that things will be faster, this case study and demonstration shows the thought process and attention to some details that should help designers considering hardware simulation acceleration.