Power has become a very hot (ha-ha) topic. The media has latched onto the emergence of massive AI datacenters disrupting energy pricing for consumers. Both as consumers and in industry we welcome faster and better features in our hand-held computing devices, cars, homes, industrial processes and businesses. But without further advances in the enabling technologies those new features and higher performance will burn more and more power, putting further strain on stretched personal and business budgets and demanding we more frequently recharge our mobile devices. What more can be done to rein in this relentless thirst for power?

Power is a uniquely challenging metric to manage because it is impacted by all aspects of system design and implementation, from application software down to detailed circuit implementation. Applications, architecture and design teams each work to minimize power in their respective domains, but they all depend on tight power optimization in the underlying enabling technologies. Part of that enablement is through power optimized foundation IP (embedded memories, logic cells, I/Os and NVM), which must be carefully designed to deliver the best power without sacrificing challenging performance goals or compromising on cost/area expectations. The Synopsys Foundation IP Teams in collaboration with their System and EDA colleagues are able to leverage capabilities they have honed on many technology nodes over 20+ years to address these goals. Synopsys has released a digest of six articles on foundation IP enablement for power optimization highlighting some of the focus areas for their organization is working towards to achieve this goal. Lots of good material. In a short blog I can only call out a few notable points which caught my eye.
Low voltage operation
An important way to reduce dynamic power is to reduce voltage since this power is proportional to voltage squared. In DVFS or ultra-low voltage systems, voltages can drop as to 0.7V or lower to minimize power for relatively performance insensitive circuitry. Many IoT devices such as implantable medical devices, must run for years before battery replacement and are operating at very low voltage to meet this need. Energy-harvesting IoT devices can go even lower, down to 0.4 volts.
Conventional foundation IP is not optimal in this regime. Embedded memories must be designed with multiple assist techniques to deliver target power metrics without compromising performance or area. Since these voltages are closer to threshold levels, reliability concerns around switching errors and delay variances must be managed much more carefully. Here Synopsys Foundation IP shows 10-30% area savings and 19-37% power savings from compiler generated memories optimized for low voltage operation. To also support architectural optimizations these compilers offer multiple levels of power control from light sleep to full power off.
Logic cell libraries must equally be redesigned for this operating regime and much more carefully characterized for on-chip variation. For deep low voltage operation, modeling methods go even further, considering higher order moments in timing distributions.
Managing power for HPC and AI
Even with all the improvements mentioned above, one size doesn’t fit all in aggressive demanding applications, especially at 2-3nm processes. Different design teams may choose to add custom characterization corners, or use shrinks from slightly larger feature size processes, or specifically optimized cells for low power and area.
The Synopsys High-Performance Core (HPC) Design Kit allows designers to adapt libraries, providing tools to tune libraries to their unique needs, in this paper for HPC and AI goals. The kit supports a wide range of Vt options, allowing for example super-high-performance processors to support boost modes in DVFS where a core can be overdriven (for a short time) at higher clock frequency. Balancing out power and thermal concerns, other logic blocks can be scaled back to lower voltages and clock frequencies. For cache memories, the kit also provides highly tuned instances to meet tight access times and setup and hold requirements.
Accelerator cores, big arrays of multiply-accumulate (MAC) blocks with supporting memory, are at the heart of AI. Packing these blocks efficiently is essential to managing area and power. Pitch matching specialized logic cells to memories is important in these closely packed repetitive structures, to minimize interconnect power.
Low power AI processors
This paper has a particular focus on AI hardware in datacenters. Here big GPUs support training AI models and are notorious power hogs. But training is an infrequent activity for most AI service providers. These businesses are most concerned with inferencing, invoked when you or I ask a question to ChatGPT or a similar model. Inferences are the primary and high-volume AI revenue generators (or cost center) for service providers. Engagements today start free but very quickly switch to subscription models, calibrated to the complexity of each request and generated response, the time it takes to deliver that response, and importantly how much power is burned in the process.
Power is a hugely important metric in datacenters, governing not only the performance, reliability and useful life of servers and ancillary equipment but also the cost of cooling methods to keep the datacenter running. Power expense in support of cooling is as significant as IT power expense. Many of the same techniques used in power saving in mobile devices such as power switching and DVFS are already commonplace in datacenter designs.
AI chips excel at simultaneous multi-threading. This is what makes them so effective for matrix-intensive AI, but it also results in higher average activity per unit area than you would commonly see in a CPU. Limiting power demand and heating therefore requires lower core voltages, perhaps 0.7 volts. However, communicating with external devices must be handled by I/Os which can span low internal to higher external voltages. Synopsys Foundation IP libraries provide special I/O cells to support these needs.
One Synopsys feature that caught my eye for AI support is their Word All Zero (and half word) memory. Optimized AI inference models are sparse, especially at the edge, containing many zero weights. Avoiding multiply operations for cases where one input is zero can be a big win for both power and performance. Another cool idea, new to me is that they provide compact latch-based memories to support activation and pooling operations.
Customizing for ultra-aggressive requirements
As extensive as these foundation IP offerings are, some design teams always want to push further. One such team, in the process of designing optical network infrastructure for Edge AI needed their logic to run at 0.4 volts, demanding memory compilers and logic libraries to match on a very aggressive schedule. Synopsys designed specially optimized memory compiler and logic library IP to meet these needs, on a timeline that helped that customer meet their targets.
Coming soon
Flash technologies have become essential in many applications, but standard implementations were never designed for embedded use below 28nm or in demanding IoT or AI applications. Magneto-Resistive RAM (MRAM) and Resistive RAM (RRAM) have become the go-to solutions for such use-cases, MRAM for high reliability and performance, RRAM for low cost and high density.
Synopsys already provides compiler-based options to deliver either class of memory instance. The MRAM options supports up to128Mb with multiple feature options and low area and power footprints. RRAM compilers are currently in development.
In process advances approaching the Angstrom level, the next technology challenge is foundation IP built on Gate-All-Around (GAA) transistors. There are plenty of interesting challenges here, yet Synopsys is already sampling 2nm GAA libraries with customers.
Very interesting papers, lots of good details which I couldn’t reasonably compress into this blog. Check it out.
Also Read:
TSMC based 3D Chips: Socionext Achieves Two Successful Tape-Outs in Just Seven Months!
CISCO ASIC Success with Synopsys SLM IPs
Share this post via:

Comments
There are no comments yet.
You must register or log in to view/post comments.