Yesterday Cadence had their annual front-end summit, the theme of which was physically aware design. I was especially interested in the first couple of presentations about physically aware synthesis. I joined Cadence in 1999 when they acquired Ambit Design Systems. One of the products that we had in development was called PKS which stood for physically knowledgeable synthesis. It was common knowledge back then that wireload models were not going to work much longer and we needed to combine placement and global routing with synthesis. It was 15 years ago that PKS was started. So now to the present day.
The first presentation was by Jaga Shanmuga of Cisco about using the beta of the new release of RTL Compiler for designing a couple of chips. One interesting datapoint is that 2% of worldwide power goes to datacenters, and Cisco is one of the contributors to that and so are very focused on low power. Their other two priorities are silicon robustness and time to market. Get the products out there quickly and make sure they don’t come back. A typical chip is 28nm with 50M instances, 30-80 subchips and die up to 24mm (one inch, almost the maximum possible) square.
The first surprise was that they continue to use wireload models for a lot of their synthesis. This is at 28nm. They are often better at the back end and a lot easier at the front end. But not always. The next level up is PLE, physical layout estimation. This is not actually physically aware (despite the anem) but it generates more accurate tables than wireloads for that block.
The next weapon is physically aware structuring (PAS) which is especially important for designs that are a large mux or crossbar, where there is a lot of interaction between how it makes sense to break the huge mux into gates and the physical layout of the routing.
Physically aware mapping (PAM) can produce large gains if the structure is already right, especially for blocks with lots of long wires (such as the Cisco design with over 100 memories with complex interconnections).
Ankush Sood of Cadence went into more detail about how RTL Compiler addresses physical challenges early in RTL synthesis. For him, 28nm is mainstream, 16/14nm is in heavy use especially in mobile, and 10nm is imminent. MHz/W/$ is the key decision metric. Chips are now more limited by pins and access rather than the core logic area and synthesis needs to take this into account. 80-90% of the wires are local interconnect but global wires are the ones that cause lack of correlation with the back end.
Other new features (other than the ones Cisco already discussed):
- multi bit-cell instantiation is improved. Single bit cells are placed and then once timing is known they can be grouped based on affinity etc. Power is reduced in the clock tree and also indirectly by the reduced area
- metal stack awareness. High layers are low resistance and lower layers are high resistance. Can group the layers into about 3 bins and then worry about long nets and clocks and assigning to bins. Wire topology needs to be optimized too, not just gates
- DPT (double patterning) awareness. Requires special spacing requirements which, during physical synthesis can be converted to padding values so that library cells get spaced out
- enahnced multi-Vt cell selection during global mapping. Reduces leakage power
- improved slew degradation estimation during timing analysis
- advanced on chip variation support based on logic depth. More accurate than plain common cell derating approach
- hierarchical flow support
More details on RTL compiler are here.Share this post via: