Most of what you have read about design for low power has probably focused on mobile devices where power consumption constraints tend to outweigh performance objectives. These devices use aggressive power switching strategies, based on the reasonable assumption that parts or all of the device can be powered down at any given time and recovery times from power-down need only match reasonable human response times.
But what about the other end of the mobile or IoT ecosystem – the cloud? Servers have to deliver high performance, they have unpredictable loads and the economics of that business pushes to maximize utilization to the greatest extent possible. As far as power-down strategies are concerned, the minimum switchable unit is typically a whole server. Some datacenters power down servers during light loading (packing virtual machines onto a subset of available servers), but this doesn’t help reduce peak power, which is where a viable datacenter wants to be operating most of the time.
Historically, processor design teams have looked at power very late in design, when they can use gate-level netlists with accurate parasitics to get within ~5% of silicon measurements. That’s in part because power has not been a primary metric for processors (reasonable was OK) and in part because processor teams do a lot of hand-tuning for performance and power-saving techniques don’t usually help performance. The need for hand-tuning, by the way, applies even if you are using ARM cores if you’re shooting for GHz performance; check out what has gone into the high-performance ARM cores provided by the foundries.
But that’s changed. In servers, low power is not about battery life. It’s about:
- Heating which leads to increased leakage, which can lead to thermal runaway and at minimum compounds all the other problems
- Cooling costs as mentioned earlier, for the datacenter as a whole and for the device because heat sinks increase server costs and active cooling increases maintenance costs
- Performance problems because a device running too hot has to slow down and that makes customers unhappy
- Reliability problems because increased heating increases delays and voltage drops in power lines which may tip some timing paths from passing to critical/failing
- More reliability problems because increased heating increases resistance in power rails which can lead to electromigration in marginally-sized rails
For all these reasons, power has become increasingly important for servers but getting an estimate late in design obviously isn’t very helpful, especially if it can take 6-8 weeks to figure it out.
So what can be done if power-down strategies are off the table? Low-level clock-gating, manual or automated, is useful but many of those methods are used for second-order improvements and we want first-order help. That means we need to look at macro-level power-saving options with an understanding of architecture and intended usage. And that requires detailed use-case power analysis, over lots of use-cases. AMD has just published an article on how they went about reducing power in one of their server-class designs using ANSYS PowerArtist for power estimation at RTL where it was still practical to refine the design microarchitecture.
Estimating power while the design is still at RTL and being able to generate estimates in minutes is essential to making this practical. Also important to AMD was very fast turn-around and multiple types of visualization for what-if analysis. Again this is because the big savings won’t come from automated gating, they come from designers figuring out where and how to make improvements in line with typical usage. So isolating and understanding power hotspots, followed by fast iteration (minutes) to experiment with power-saving scenarios is critical. (I expect ANSYS’ Big Data analytics in Seascape to further enhance this value proposition in the future.)
It’s always interesting to hear a customer’s estimates of the impact of a tool. AMD said that using PowerArtist, they estimated that in idle-mode the design was using only 16% less power than when in full bandwidth mode and that more than 50% of this power was consumed by the clock distribution network alone. Hello opportunity for a first-order power reduction. Based on RTL analysis they were able to isolate areas where many cones of logic could be clock-gated. They also found, thanks to what they felt was quite accurate early RTL modeling of clock power distribution in PowerArtist, that they could in many cases move gating closer to the root in the clock tree, saving significant power consumption in the tree alone. In addition they found that flexibly adjusting the size of the queue based on queue utilization could reduce power. Between these factors, AMD was able to reduce idle power by 70% and also saw a significant improvement in active power.
Though not mentioned in the AMD article, you should also know that PowerArtist generates an RTL Power Model (RPM) which can be read directly into RedHawk or SeaHawk for power-aware power integrity, thermal and EM analysis. So ANSYS has you covered for everything you need in server power analysis – for power consumption and optimization, for integrity, for heating and for reliability. Pretty cool solution (pun intended).