Of course that reduction has to stay throughout the design cycle up to layout implementation and fabrication. Since the advent of high density, mega functionality SoC designs at advanced nodes and battery life critical devices played by our fingertips, the gap between SoC power requirement and actual SoC power has only increased. There has been enough emphasis on power reduction techniques such as gate and interconnect capacitance reduction, voltage and frequency scaling which have reached their limits keeping in view the performance and process variation at lower nodes. Then there are effective techniques such as clock and data gating, memory gating, flop sharing and cloning etc. available at RTL to reduce activities. However, how often are these done in the right manner? In order to gain maximum power reduction, they need to be guided by sequential analysis of the design across state boundaries (and their behavior across clock cycles) which can eliminate unnecessary computations and reduce power consumption per operation or spread the operation over a larger time. So, how do we do it?
I had a great learning from a webinarat Calyptowebsite to gain the maximum advantage of using these techniques in a manual-cum-automated way. The combinational clock gating that saves power in flops by eliminating ‘clock’ power in gated flops (without any power saving in downstream logic) is very common in existing synthesis tools and is verifiable by any combinational logic equivalence checker. The data activity reduction can be done by sequential data gating, reduction in the number of operators and operand reordering by pushing the high activity data operand towards later stages of a complex operation. A significant power saving can be done by ‘flop sharing’ technique where flops are shared between data and control paths, eliminating redundant flops. Then there is ‘flop cloning’ that reduces activity by cloning high fan-out flops and identifying specific gating conditions. Similarly, reduction in memory activity can be an important source of power reduction where memory enable can be shut-off during any redundant read or write. The memory can be put in sleep mode as ‘light sleep’, ‘deep sleep’ or ‘shut down’ depending upon the situation.
As discussed above, there are very effective power saving techniques, but how to best utilize them in order to gain maximum saving in power? Above is an example of sequential clock gating where the key is to find when the data read or write is going to be redundant and then gate the flop appropriately, thus saving power in clock as well as logic. However, the practical situation is not so simple to find out such conditions.
Consider the above circuit; a simple pattern matching tool cannot detect such conditions. It requires mathematical and formal reasoning to find conditions under which writes to a flop never make their way to the design output or the same data value is getting written over and over to the same flop. In other words, a non-pattern-dependent formal approach is required to discover gating conditions.
TheCalypto Power Platform has automatic sequential analysis and optimization capability (vectorless or controlled by user provided switching activity) that performs exhaustive analysis of a design to find all optimization opportunities, computes potential power saving for each of the optimized expressions and determines optimal enable logic that can maximize power saving without impacting area or time.
The Calypto RTL power flow provides very early, fast and accurate feedback on possible power saving in a design along with any area impact, information about complete and incomplete clock-gating expressions and any wasted power. While a complete expression found by RTL sequential analysis is safe to gate a clock to save power, an incomplete expression may change design functionality and hence needs interactive analysis and correction before implementation of clock gating.
Above is an example of incomplete expression where value of a signal from previous cycle is not available, and data and control paths are optimized separately.
Similarly, there is another example of incomplete expression where registers appear to be in multiple clock domains. It’s unsafe to use a signal from different clock domain to create clock-gating expression.
The overall flow is very flexible and robust to provide lint clean optimized RTL (that takes care of CDC and timing issues) with ECO support and equivalence checking against the original RTL through Calypto’s unique SLEC (Sequential Logic Equivalence Checker) tool. The automated optimization implements gating expressions automatically. If time schedule permits to do more power optimization, then designers can analyze the incomplete expressions, complete them by fixing in RTL and iterate over the flow to gain maximum reduction in power.
This flow, having manual exploration with automation was performed on a few TI designs which provided impressive results; 2-3 iterations without any impact on design schedule resulted into overall power savings in the range of 26% to 52%.
Calypto has variants of specialized power estimation and reduction tools for various design needs; PowerPro CG for logic, register and clock-tree; PowerPro MG for memory; PowerPro Adviser for IP core where manual control over design is needed; PowerPro PA for RTL power estimation and analysis of results.
The challenge of power optimization of SoCs and IPs can be addressed by power efficient RTL, and to increase the efficiency of RTL for maximum reduction in power, sequential analysis followed by automated and interactive optimization of RTL is a must. Since the optimization is done at the functional level in RTL without changing the functionality of the design, it stays throughout the design process. More details can be obtained from the on-line webinar, very well presented by Abhishek Ranjan, Sr. Director of Engineering at Calypto.