Clock gating is one of the most basic weapons in the armoury for reducing dynamic power on a design. All modern synthesis tools can insert clock gating cells to shut down clocking to registers when the contents of the register are not changing. The archetypal case is a register which sometimes loads a new value (when an enable signal is present, for example) and otherwise recirculates the old value back from the output. This can be replaced with a clock gating cell using the same enable so that the register is only clocked when a new value is loaded, and instead of recirculating the old value the register is simply not clocked at all so that it retains the old value.
The efficiency of clock gating can be measured by clock-gating efficiency (CGE). Static CGE simply counts up the percentage of registers that are gated. But not every clock gate has much effect. In the archetypal example mentioned earlier, there is little power saving if the register loads a new value almost all the time, and a huge saving if the new value is almost never clocked in. Instead of using static CGE, dynamic CGE, the percentage of time that the clocks are actually shut off, is a much better measure.
But even dynamic CGE ignores just how much power is actually saved. If the enable signal shuts off a large part of the clock tree then the power saving can be large and it is worth the effort to try and improve the enable signal so that it captures all the times that the clock can be suppressed. On the other hand, if an enable only applies to a small part of the design (perhaps just a single flop) then there is little point in trying to optimize the enable (and, in fact, just clock gating the register may not even save power versus leaving the multiplexor to recirculate the output bit).
To perform this analysis most accurately requires clock-tree synthesis (CTS) to have been completed. But this is part of the physical design flow and is too late to return to the RTL level to optimize the RTL to incrementally reduce power. Instead, Apache’s PowerArtist allows this analysis to be done at the RTL level using models of the clock tree and the associated interconnect capacitance. This allows the enable efficiency to be calculated for each clock gate and highlights the cases where a gate controls a large amount of capacitance and so is a candidate for additional effort to further improve the enable efficiency and so further reduce power.
See Will Ruby’s blog on clock gating here.