Designers spend plenty of time analyzing the effects of process, voltage and temperature. But everyone knows it’s not enough to simply stop there. Operating environments are tough and have lots of limitations, especially when it comes to power consumption and thermal issues. Thermal protection and even over-voltage protections have been in chips for many years. However, there is more at stake than just preventing failures. It’s necessary to tune the operation of SoCs so they have long life and lower cost of operation, plus they need to stay within the limits of the cooling systems used in the facilities where they are located. In-chip monitoring can help manage power consumption and thermal issues.
This was a topic at the recent TSMC OIP event. Stephen Crosher, CEO of Moortec, a provider of IP for in-chip monitoring, presented on the topic of “Challenges of N5 HPC and Hyperscaling within Data Centers.” Small savings at the chip level in power consumption and heat generation translate into meaningful results when scaled up. Stephen points out that hyperscale data centers can have in the order of millions of SoCs.
Data centers already consume 1-2% of all electricity produced globally. Chinese data centers alone over the next 5 years are expected to use as much electricity as all of Australia. Data center traffic and workloads are expected to rise by 80% over the next 3 years.
The only way to effectively manage power is to design in feedback systems to manage SoC operation so that they minimize the power. The first step in doing this is to ensure there is accurate and complete information about all three of process, voltage and temperature. With the right kind of in-chip monitoring capabilities many things can be done inside of an SoC to respond to each of these conditions.
Stephen provides examples of how tightening the voltage monitoring precision at the terminus of the supply nets from 5% to 1% can reduce supply guard banding and reduce power by ~10%. Multiplied across millions of chips, fractions of a penny per hour per chip translate into savings of millions of dollars per year. Likewise, for thermal management, more accurate sensors can prevent premature device throttling. Moortec’s ‘out-of-the-box’ high accuracy sensors can help avoid unnecessary throttling when compared to more alternative +/- 5% sensors, especially with considering that Moortec sensors can achieve even high accuracies if calibration can be accommodated in production test.
N5 is an appealing process for high performance chips. It offers around a 15% speed improvement along with an 80% greater logic density. It also reduces power consumption. However, at the same time the power density per square mm is going up. So dynamic voltage and frequency scaling will increasingly be important for managing energy consumption and thermal behavior. Stephen points out that for every watt saved on-chip, there is a commensurate reduction in facility cooling costs. Hyperscale data centers spend 40% of their operating costs on cooling, so there is even more incentive to lower server power use.
The future of in-chip monitoring looks very interesting with telemetry facilitating reporting and analysis. Some of the benefits could include enhanced device screening, power optimization, increased performance and extended reliability. Many of the benefits can go beyond large data centers and find their way into automotive, consumer and other applications. Moortec has been developing in-chip monitoring solutions since 2010 and have ample experience on a wide range of process nodes, including the most advanced. The presentation was eye opening as far as the impacts of chip level optimizations on facility, enterprise and even worldwide economies and environmental impact.