I’ll never forgot working at Intel on a team designing a graphics chip when we wanted to simulate to ensure proper functionality before tapeout, however because of the long run times it was decided to make a compromise to speed things up by reducing the size of the display window to just 32×32 pixels. Well, when first silicon arrived, sure enough, the only display that worked was 32×32 pixels, so we had to do another re-spin to correct for logic bugs. In the 1980’s it was quite popular to be using logic simulators that were interpreted, making it easy to interactively debug hundreds to thousands of gates.
In the 1990’s I was working at an EDA company that acquired the simulator company that had just written the fastest, compiled-code Verilog simulator. Wow, what a dramatic improvement over the older, interpreted logic simulators.
Today, we have SoCs with billions of gates, so this extreme size has really pushed the EDA vendors to come out with something new that can handle that capacity with run times that take hours to days, instead of weeks. The new approach to deal with these present day challenges is a 3rd generation, parallel simulation engine that scales. Here’s a chart showing the three generations of functional simulators:
I spoke by phone with Adam Sherer of Cadence Design Systems recently to get his insight about functional simulation since the 1980’s. It turns out that back in early 2016 Cadence acquired this start-up company Rocketick with a parallel simulator called RocketSim. Yes, most of the EDA companies had been trying to develop their own parallel simulators, but the earliest results were not promising enough to become viable products because of poor scaling and manual compile processes. The real accomplishment of RocketSim was to provide a parallel simulator that could:
- Handle multiple cores
- Accept multiple clocking domains
- Work with complex interconnect fabrics
- Simulate hundreds of IP cores
- Scale to billions of components
- Support RTL, gate-level functional simulation and gate-level DFT
Related blog – EDA Mergers and Acquisitions Wiki
The secret sauce behind RocketSim is the ability to identify dependencies among independent threads of execution, while minimizing the memory footprint required. You can expect the following typical speed-ups when using this parallel simulation approach:
- 3X for Verilog / SystemVerilog RTL
- 5X for gate-level functional simulation
- 10X for gate-level DFT
With fine-grain multi-processing technology, you can run RocketSim on multi-core servers using up to 64 cores, and it knows how to separate your code into portions that can be accelerated, and portions that cannot be accelerated. For the actual users of this simulator you don’t need to change your testbench, design or even the assertions, now that’s convenient.
The largest SoC teams have long used hardware-based engines like Palladium to get even faster runtimes, although that approach can become pricey compared to software simulators. One difference between a software simulator like RocketSim and hardware engine like Palladium, is that RocketSim handles four-state logic which includes the Z and X states while the hardware engine supports only 2-state logic.
Related blog – Improving Methodology the NVIDIA Way
I was impressed to learn that the RocketSim team, based in Israel, has actually grown in size since being acquired by Cadence, always a positive sign that the team is being treated well and that the marketplace is growing for a parallel simulator.
Functional simulation has come a long ways since the 1980’s, so we are living in exciting times as the promise of parallel simulation is being adopted to keep simulation run times reasonable instead of having to wait weeks and months for regression results. Adam Sherer has written a White Paper on this topic that you may read online here.