SoC designers have always wanted to simulate hardware and software together during new product development, so one practical question has been how to trade off performance versus accuracy when creating an early model of the hardware. The creative minds at Carbon Design Systems and ARM have combined to offer us some hope and relief in building virtual platforms that are both fast and accurate enough. Some 4,000 attendees were at the ARM TechConlast fall when Bill Neifert of Carbon Design Systems and Rob Kaye of ARM presented: High Performance or Cycle Accuracy? You can have both.
I’ve just read the 10 page White Paper created in January based on that ARM TechCon presentation.
The chart shows Model Speed on the X-axis, where higher simulation speed is favored by software developers and speeds of hundreds of MIPS are now possible if you write a Loosely Timed (LT) model, shown in Green. To attain that high speed requires that the model be written at a high-level of abstraction, meaning that low-level details be omitted. Programmers benefit directly from Loosely Timed models because they can develop and debug their new apps, profile their software and determine if the architecture is compliant to specs.
In the bottom-left corner is the Grey box showing that Cycle Accurate (CA) models can be created that are faithful to the RTL timing as defined by the hardware engineer. Because the timing is accurate you can develop device drivers with a Cycle Accurate model, and also perform hardware/software co-verification, finding and fixing bugs before fabricating the SoC.
Instead of creating a third level of modeling as shown in Brown called Approximately Timed (AT), the approach taken by ARM and Carbon is to combine the benefits of both Loosely Timed and Cycle Accurate models.
Loosely Timed Models
An Architectural Envelope Model (AEM) is created first as an executable spec of the architecture. The AEM can be further refined to a specific CPU core using implementation specific details or adding optional features.
High simulation speed is gained by doing Code Translation (CT), a technique where code sequences for the specific CPU get translated to code sequences on your computer to be run natively. This CT approach runs much faster than previous approaches like interpreted Instruction Set Simulators. A typical 2GHz workstation could simulate the Android OS on a model of the ARM Cortex-A15 processor in about 50 to 100MIPS.
Cycle Accurate Models
The CA model has to create correct results for every cycle plus functionally correct results every cycle. This CA model can even be used as a replacement to RTL (Register Transfer Level) code, possibly requiring pin adaptors.
ARM IP also has CA models that were created automatically from RTL code using Carbon software, so you don’t have to do any modeling work, just use the CA models. On the other end of the speed spectrum ARM also provides Fast Models which can be interchanged with the CA models.
When you need to do some debugging, then use CA models. For pure speed, instead use the LT models.
Virtual Platform with Cycle Accurate and Loosely Timed Models
When you want to develop firmware it’s recommended to keep the processor and memory subsystem modeled as LT for speed, and then use CA models as needed. The limit to simulating with a combination of LT and CA models will be the CA model speed because of it’s higher detail.
A new technology called swapping even lets you start out with all LT models, gaining high speed, then at the time of interest swapping in some more detailed CA models to continue.
To do the model swapping requires creating a checkpoint of your system, so each of your models must support checkpoints.
Here’s a quick comparison of what Carbon recommends versus other virtual prototype approaches:
[TABLE] style=”width: 500px”
| Fast Simulation
| Best Accuracy
| Fast + Accurate
| LT models
| CA models
| LT models
| Emulator or FGPA Prototype
| – None –
So other approaches give you either fast simulation or accuracy, however never at the same time. The downside of an emulator is the high cost, and the downside of an FPGA prototype is limited visibility to debug.
Swapping does have some downside, like cache contents are not saved, so stay tuned for more improvements ahead.
The approach of using both Loosely Timed and Cycle Accurate models together in a virtual prototype for ARM IP is compelling because of the 50-200 MIPS simulation speeds. This approach accelerates software debug, firmware development, architectural exploration, performance analysis and system debug.
10 page White Paper