Ken Brockof Synopsys presented on how to optimize your SoC design for low power at 40nm, 28nm and 20nm nodes in a webinar today. Ken and I both worked together at Silicon Compilers back in the late 1980’s, the best EDA/IP company that I’ve had the pleasure to join.
The webinar made a brief mention of 14nm and FinFETS but didn’t share any details on 14nm libraries or design methodologies.
Performance, Power, Area (PPA) depends on the target application ranging widely across: high-performance computing, mobile (smart phones), mobile computing (tablets) and multimedia.
How do logic libraries meet: area density, performance, power density, process variability, device scaling, routing complexities?
Standard cells can consume 50% of total SoC power. How to optimize for battery life, low-cost plastic packaging (while avoiding ceramic packaging)?
Some low power design techniques:
Active back-bias not being used so much at 45nm and below now.
Power is related to FCV*V, so reducing Frequency or Voltage are key ways to power reduction. Some designs using up to five levels of Integrated Clock Gating (ICG).
Fine granularity drive strengths can reduce the gate capacitance (C).
Multi-voltage design reduces V on the non-critical blocks.
Dynamic Voltage and Frequency Scaling (DVFS), reduces both V and F.
Automatic insertion of ICG can be done with Synopsys tools, saving you time and effort.
Use a power optimization kit to minimize your core leakage:
Eliminate leakage in idle blocks, use sleep mode with power gates and isolation cells.
Maximize idel block leakage reduction with quick naps.
Use multiple voltage domains with level shifters.
Use lower leakage devices in active mode with multi-VT devices.
UPF and CPF methodologies will help you reduce overall power.
How to balance: Power, Performance, Area, Design Time?
Comparison of 40nm versus 28nm libraries as a function of frequency and area:
You get to choose the appropriate library to reach your PPA metric.
Some library and technology strategies for power reduction at several process node generations:
Long channel devices will save you power. Here are three libraries at 40nm using different channel lengths:
The bottom right on the chart is lower power, lower area.
At 28nm the power savings are similar to 40nm libraries, consider using long-channel libraries:
If you need more speed then the library can use Over Drive (at the expense of leakage).
For 20nm libraries you can choose different libraries to trade-off leakage and performance:
That’s a lot of library choices to logic synthesis, so how do you get best results with synthesis?
You could close timing first, then optimize for area and power.
Here are some alternative optimization approaches for 20nm design just using VT choices in standard cell libraries:
Multi-channel library usage strategies include:
These recipes give you some logic synthesis and library approaches to help achieve your PPA goals.
The DesignWare Logic Libraries from Synopsys are available to support: multiple cell architecture, long channel support, DVFS and shutdown, metal and poly alignment with memories. Synopsys has both libraries and EDA tools (logic synthesis, Place & Route, DFM, EM) that combine to save you design time and reach closure quicker.
Q: Comment on adding variability with different cell libraries and Clock Tree Synthesis?
A: Minimize variability by using longer channel lengths on clocks. Your initial clock layer could use minimum channel length, then remaining clock layers all use longer channel lengths.
Q: How much is synthesis run time affected as I use your recommended approaches?
A: We recommend using two libraries at most in order to keep fast synthesis run times.
Q: In 40nm node how are 45nm cells generated?
A: They would be drawn at 45nm. We support 40nm and 50nm nodes.
Q: Is this 28HP or 28HPM nodes?
A: I’ve talked about generic technology, call us for specific libraries.
Q: SHould the user define multi-scenario and enable it for multi-VT libraries?
A: Yes, that is a valid approach.
Q: For lower power target is there a specific library that I should focus on for CTS?
A: Depends on the frequency of your design, clock depth, etc.
Q: For ultra-high performance why not use LVT_min during optimization?
A: You could do that for remaining timing violations, but it is very leaky (about 2,000X more than the min).
Q: What is your recommendation for don’t use lists?
A: It’s a good way to guide synthesis into doing the right thing in meeting PPA goals. Ask our AEs or consultants for best results.