I don’t look at the RTL power estimation topic too often these days, so I was interested to see that ANSYS still has a very strong position in this area. Qualcomm is using PowerArtist on one of the most demanding modern applications – mobile GPU power gaming. Mobile gaming heavily loads the GPU, so any optimization in that area will affect battery life. This is a world-class test because it’s not just ‘more of the same but bigger’. Gaming benchmarks are really going to stretch the range for that ever-present challenge in power estimation. bridging the gap between system-level use-cases and RTL-level power calculations.
There’s so much complexity in modern GPUs that averaged power estimates across relatively simple directed tests fall short. These are simply not going to be good enough to drive intelligent optimization choices in RTL design. Jiaze Li from Qualcomm presented a paper at a recent ANSYS Simulation World on their more realistic approach.
Gaming Benchmarks
First Qualcomm start with realistic gaming loads. Jiaze mentioned Manhattan and Aztec Ruins as two popular games used for GPU benchmarking today. They extract multi-millisecond sequences from these games as their basis for testing. These are still long enough that simulation must run on an emulator. ANSYS PowerArtist uses an activity streaming interface with Mentor Graphics’ Veloce emulator to enable the efficient transfer of long activity patterns. Qualcomm uses this flow to drive power analysis with PowerArtist. They can also track how power is changing as the design evolves and to optimize RTL for power reduction..
Jiaze added that the emulation flow is too cumbersome for detailed power debug. Instead they use a parallel simulation-based power flow. The tests they use here are derived from the same large gaming benchmarks. However, they greatly reduce size to capture the essentials of graphics features which can still run in reasonable time on the simulator. This reduction is very much a manual task, something into which Jiaze and the team put a lot of work, but they’ve figured out a process to efficiently build these reduced tests.
Windowed Analysis
The second important point is that they divide the analysis time, by graphics features, into multiple windows. The systems team defines the windows, which are not generally equal in size. PowerArtist then calculates power-estimates per window. This gives them a chunked timeline view of averages, in which they can see variations in average power as a function of feature. That he says gives them a lot of insight into contributors to power in any given window. Which also suggests how they might best optimize not only for average power but also for some sense of peak power.
Jiaze said that the flow is running in bi-weekly production regressions at Qualcomm. They have used the flow to drive a 5% reduction in power on their most recent design. Most of the improvements were through adding clock gating and eliminating redundant data toggling. He added a very nice bonus in their use of this method. They are able to very concretely justify the power reductions they are able to find. Much better than a more general ‘we suggested a bunch of improvements and see – it got better!’
If you want to hear the talk, click HERE to go to the ANSYS Simulation World recorded event. This talk is the sixth under “Semiconductors”. You can also learn more about PowerArtist HERE.
Also Read
The Largest Engineering Simulation Virtual Event in the World!
Prevent and Eliminate IR Drop and Power Integrity Issues Using RedHawk Analysis Fusion
Reliability Challenges in Advanced Packages and Boards
Share this post via:
Comments
There are no comments yet.
You must register or log in to view/post comments.