Power in IoT edge devices gets a lot of press around how to make devices last for years on a single battery charge, significantly through “dark silicon” – turning on only briefly to perform some measurement and shoot off a wireless transmission before turning off again. But we tend to forget that the infrastructure to support those devices – gateways, backhaul communication and clouds – cannot play by the same rules. This infrastructure must deal with unpredictable traffic in high volumes, where power-down strategies are impractical.
Moreover, if you consider total power burned and the cost of that power, this is far, far higher in the infrastructure than in the edge devices. Datacenters alone are believed to consume about 3% of total energy produced worldwide. Those costs motivate owners of infrastructure to lean hard on equipment suppliers to reduce power by whatever means they can. Meeting that objective generally requires a much more nuanced management of power, depending heavily on understanding how a wide range of realistic workloads will drive the system.
AMD is a company whose products are used in datacenters, wireless and wireline network applications, network security and unified communications applications, so they feel that pressure across their product line. Illustrating this, they have a white-paper on how they approached power reduction in one of their server-class designs, written by a couple of technical team members based in Austin.
The objective was a retooling for lower power since they were starting from an existing design; changes to the fundamental architecture weren’t an option. Implementation-stage tweaks wouldn’t return big enough savings so that left micro-architectural fine-tuning as the only way to drive down power. While such changes are usually quite modest, impact can be significant – AMD was able to reduce idle power by 70% and peak power by 22%. But as always in power reduction, there’s no simple recipe for finding the best places to make changes. You have to try a lot of possibilities against a lot of different use-cases, then decide which of those are most promising in power saving, while balancing impact on other factors such as area and timing.
That kind of iteration isn’t possible if you’re going to measure the impact through power estimation at the gate-level since each RTL what-if would require a re-implementation cycle through multiple tools. AMD estimated that it would take 6-8 weeks to generate power estimates at the gate-level, at which point that analysis would be irrelevant to a design that had evolved far beyond the point at which the measurements were made.
A much better approach is to iterate on changes with power estimation at RTL. In absolute terms this won’t be as accurate as estimation at the gate level – RTL-based estimation must estimate Vt mixes, design-ware mapping, clock trees and interconnect parasitics, all factors which are known at the gate level and on which gate-level accuracy depends. But relative accuracy between RTL-based estimates for modest changes on the same design can be much better. Moreover, for incremental changes estimation can intelligently re-compute the impact on activity factors without need for re-simulation, making power-estimation even faster. AMD observed that these factors together trimmed analysis time to a day.
The design team chose Ansys’ Power Artist to drive their RTL power estimation. This is a product with a distinguished history. Commercial RTL-based power estimation was first introduced to the world back in the 1990’s by Sente, in a product called WattWatcher. The product and the team have gone through a couple of acquisitions and some names have changed, but PowerArtist is that same product, greatly evolved of course. Point being, they’ve been doing this for a long time and have wide-spread industry recognition as experts in the space.
AMD provide detail on how they used PowerArtist to isolate power hogs and experiment with improved clock gating. They also found that 50% of estimated power was being burned in the clock distribution network (PowerArtist models this network for this reason). This is a good example of why designer judgment (guided by input from the tool) is so crucial in power reduction. You could reduce power by gating clocks at the leaf-level where feasible and still not significantly reduce power, whereas carefully planning gating higher in the clock tree for a smaller number of leaf-cases could make a much bigger difference.
Power reduction is a process as almost anyone in the game will be eager to tell you. You don’t do one task then put the power tools away. You’re constantly checking and improving, which is again why fast iteration in power estimation is so important. I found one chart in the AMD write-up particularly interesting – a trend chart for estimated power as the design progressed. Power-aware design teams have this process embedded in their regression suites so they can update trend charts like this as the design evolves. You fixed that timing problem or you changed the queue-manager from fixed length to variable length but power spiked up again – what happened? Getting frequent updates is the best way to check and correct before problems become unfixable. PowerArtist has the tools you need to support this kind of analysis and trending.
You can read the AMD team’s write-up HERE.