One great benefit of designing at the ESL level is the promise of power savings on the order of 40% to 70% compared to using an RTL approach. Since a typical SoC can contain a hierarchy of memory, this kind of power savings could be a critical factor in meeting PPA goals. To find out how an SoC designer could use such an ESL approach to power savings for a system memory subsystem I interviewed Gene Matter of DOCEA Powerby email this week. DOCEA’s company headquarters is in France, and Gene works out of their San Jose office. One thing that Gene and I have in common is our Intel alumni history; I started my design career with DRAM chips, while Gene spent 23 years at Intel working on the x86 processors, chipsets, USB, PCI and memory technology.
Q: Why model memory and at what level should that model be?
Memory subsystems have a huge impact on performance. In almost every processor and system performance simulator, excruciating detail is provided in modeling the core instruction set clock counts, internal bus interconnect bandwidth, buffering/posting, cache organization, main memory and storage to get an accurate performance prediction for application and benchmark performance.
What struck me are the die area dedicated to high speed cache SRAM and the type and diversity of embedded memory types in modern SoCs, chipsets and processors all of which are process technology, cell library and implementation dependent. The corresponding power models for most memory is either at the transistor level which is way too detailed for functional simulation or very abstract and simple bandwidth, % of traffic types and approximations of the power. Major components affecting power such as the temporal/address dependent behavior such as cache hit /miss, snoops, DRAM page hit/miss, burst reads, posted writes etc.. are available from the performance simulator but you also need to annotate and parameterize the power models from the core, interconnect, memory controllers, IO and memory to estimate power as function of task consumption/completion. I’ve seen many estimates try to predict power by just using statistical data from characterized workload and then just plug in power for the memory blocks, then try to sum up the results.
Q: What are some considerations of power models for memory subsystems and SoC/systems?
- You need a dynamic, realistic set of workloads or applications. Either you can use VCD or CSV data from a performance/functional simulator or characterized workload data from performance analysis or software emulation. You can also co-simulate with a performance/functional simulator and power/thermal model and simulation tool
- You need to build the complete memory subsystem and account for the application/SW flow through the machine architecture. You can start with a simple block diagram and then data flow for the major transaction types that occur.
- Then parameterize all the interconnects, memory controllers, IO and memory blocks with corresponding power equations, values or scripts to provide the dynamic current, static retention current (e.g., DRAM refresh or SRAM) from the component data sheets, IP + process/voltage temp data per process technology used in each design
- Add the memory power states and corresponding power in each state and the power state transitions events or triggers
- Now you can get power as function of application behavior and solve the power as function of time and temperature
Q: How do you create power models of memory subsystem?
One approach is to use Docea’s Aceplorertool which provides templates of most common IP blocks and memory. Users can also build a library of templates using our scripts to parse an IP component data sheet, silicon compiler data per process technology, memory type, and organization. We also have automated scripts to read in excel spread sheet data, or from an IP repository in a shared network data base.
The 3 tools we recommend are:
- The modeling kit, a set of worksheets in Excel to build each component power model
- The assembly kit and Aceplorer Power Intelligence which is a front end to include the interconnects, clock and power intent
- Python scripts/automation to build the models, view the configuration and manage/maintain all of the models
Q: How do you simulate power and performance for memory?
Once you have built power models for your system you can build scenarios which can be based on time flow charts, message sequence or data flow diagrams that are sequential events. A simple scenario can start with a reset, initialization phase, then boot, OS/application loading into memory, then execution of tasks. The accuracy of the each phase/event can be mixed where static or average values might be used for some phases and transaction or even cycle based values are used. Also the scenarios can have steps, time stamps, and concurrent tasks as well as delay and synchronization points.
Q: What type of analysis is important for memory subsystems?
Early power estimation is pretty much mandatory to meet product design targets for battery life and application dependent performance. Many OEMs and OSV’s will specify the MP3 audio and H.264 or HD video playback in hours, baseband/wireless modem talk time/standby, web-browsing and file upload/download power. So you need library of representative workloads that can drive a dynamic analysis.
Early on in the meta-partitioning of the system you are evaluating tradeoffs, like:
- Number of cores and whether it is homogenous. heterogeneous cores or hybrid types of cores: then you need to see if you can thread or parallelize the code
- Frequency, voltage min, typical and max/turbo ranges
- Fixed function, DSP or offload engines
- Bandwidth and concurrency of interconnects
- And the critical part is whether you have over or under provisioned the memory speed, latency, working set size with your cost and power budget
Now you need to parameterize all the components with power data. You can build power models of all the current IP, you may re-use from existing data or annotate/update them from a shared repository. You can build quick “what if “models from vendor or internal IP specs for new models. Then you can apply process technology vendor data for transistor types used in each IP block, process corner, and temp data.
Finally, you want to analyze power as a function of temperature, workload and system configuration.
Q: You’ve thought out this memory subsystem modeling quite thoroughly, how would you summarize the DOCEA approach?
Memory subsystems are increasingly more complex. In many systems they are “make or break” in terms of meeting performance and cost goals. Now that many systems are also power and thermally constrained, it is imperative to have power/thermal models that you can build early and that track the design as it progresses within the form factor and cost budgets. Also there is variability in memory vendor power and many tradeoffs as to where you spend your money: on caches/die cost, on stacked/PoP or embedded DRAM, or in low power/high density and fast SSD/mass storage. The ability to automate the creation of power models, parameterize them behavioral (VHDL/System C/System Verilog), to functional (RTL/Gate level) to structural (fully place and routed/circuit and CFD) model data is key to productivity and project management.
The memory and storage hierarchy and organization are critical not just for performance and functionality, but also for the most robust thermal designs and most energy efficient. The ability to model and simulate thermal, power and performance as early as possible can be a huge competitive advantage for chip design, module/subsystem and complete platforms.
Next Generation of Systems Design at Siemens