Synopsys seems to particularly excel at these events, whether in half-day tutorials at conferences or, as in this case, in a full-day on-site workshop. You might think there’s not much that can be added in this domain, other than to bring low-power newbies up to speed, but you’d be wrong. This event set the stage with surveys on needs in power management and verification (maybe this was for the newbies but good to recap), a detailed look at implementation aspects, the emerging importance of pre-RTL UPF checks, a very enlightening discussion on the scalability of UPF for large designs (hint – this is a problem) and a discussion on on-going work to attack that problem.
There were also a couple of customer presentations, one from Intel and HYGON. In deference to the wishes of both companies I won’t discuss their presentations. I also won’t cover every Synopsys presentation to keep this blog to a manageable size.
Low Power Trends
Sridhar Seshadri (VP and Chief Architect at Synopsys) opened with an overview. Their customer verification surveys show low-power verification neck-and-neck with debug as the top verification concerns in 2018. Initiatives they have to manage the need include software-driven power analysis and signoff power closure. Mostly well-known flows here: ZeBu and Virtualizer for early analysis, ZeBu and PrimePower for peak and average analysis on RTL, and PrimePower and RedHawk for power and IR-drop signoff.
Mary Ann White (Dir Marketing in Synopsys DG, also an ISO 26262 functional safety practitioner) presented results from their customer survey, showing for example that while timing closure, timing and area goals and on-schedule tapeout lead all other concerns by a 2X+ margin, power concerns follow right behind. One very interesting insight was a side-by-side comparison of mobile and automotive expectations. In order of mobile and automotive:
- process – 28nm to 7nm versus 180nm to 7nm
- design size (instances) – 100M+ in both cases
- frequencies – up to 4.2GHz versus up to 77GHz
- voltages – 0.5 to 1.8V versus 1 to 60V
- temperature – 0 to 40 degrees versus -40 to 150 degrees
- expected lifetime – up to 3 years versus up to 15 years
- target field failure rate – <10% versus zero
Automotive has caught up on process and size, is ahead on frequency, a wider span on voltage, and unsurprisingly more demanding on temperature, lifetime and failure rates.
Progress in Implementation
Mary Ann mentioned a number of power saving techniques available in the Synopsys implementation flow, including concurrent clock and data optimization, intelligently relocation ICG gates closer to the driver, more multi-bit banking and de-banking support, low power restructuring in DC NXT and in fusion between ICC II and synthesis, optimization in PT and ICC II for ultra-low voltage operation and optimizations in power recovery at signoff by downsizing or swapping cells on Vth. Each of these is delivering meaningful improvements in dynamic and/or leakage power savings.
Viswanath Ramanathan introed support for multiple power domains in a single voltage area with a couple of examples. I’m not going to butcher his technical explanation here – contact Synopsys for more detail.
UPF Scalability for SoC
Harsh Chilwal (PE at Synopsys) gave a fascinating and somewhat concerning presentation on the scalability of UPF. We all know that designs are getting bigger and so of course UPFs are getting bigger. What may be less apparent is how quickly at the SoC level UPFs are getting bigger and more costly to compile, at seemingly a super-linear rate even on a log-scale of UPF complexity. Harsh told us that this is the nature of the beast, mapping essentially flat Tcl (the foundation under UPF and most things EDA) onto structural RTL. This can only happen effectively after the design is resolved and can amount to 4X+ of total elaboration time for a VCS simulation.
Flexible though UPF is, that flexibility often fights efficiency. Loading UPF files (sometimes many nested files) for every relevant instance creates zillions of UPF objects, chewing up compile time and memory. Transitive find commands, beloved by many users for their adaptability, create huge strings which can easily overflow in a good-sized SoC and are correspondingly expensive in time and memory (blame Tcl, not UPF for that). Path-tracing, needed again for adaptability, can equally be hugely expensive in an SoC if not carefully bounded. These and other factors highlight the challenging tradeoffs between ease of use and practical bounded use in UPF-based applications.
Harsh suggested a number of methodology best practices to avoid or at least mitigate some of these problems, for example using wildcards rather than find_objects and using soft or hard macro attributes to identify IPs and thereby bound path-tracing. He also suggested using power models to make the UPF modular rather than flat and bind those models to the RTL, avoiding a lot of redundancy. He also talked about some forward-looking work they are doing on hierarchical compile as a way to break free of the flat UPF paradigm.
Kaushik De (Scientist at Synopsys) followed with an abstraction approach, a likely unavoidable tradeoff as designs and UPFs continue to grow. For this purpose, they define a signoff abstract model (SAM) which he positions as similar to a flat model, minus the things you don’t need to know (the devil is no doubt in those details); they have mechanisms to create, write and read SAM models. Kaushik also showed customer stats with significant run-time and memory improvements exploiting SAM-based flows.
The trick with hierarchical analysis is to ensure you can trust that nothing falls through the hierarchical cracks. He showed a couple of approaches they use to build confidence in the validity of the SAM-based analyses. Each compares abstracted analyses with full-flat analyses to ensure no violations are lost. Any disconnects are used to refine the SAM models I presume. I understand that customers using these flows today do the both analyses and comparison on an initial run, then use hierarchical analysis for subsequent runs, perhaps adding a full-flat run at the end for security.
Machine Learning in Low Power
It’s happening everywhere else; no surprise ML should appear here also. Mary Ann first talked about ML optimization for PrimeTime power recovery, achieving run-time speedups of between 4X and 10X. This is a supervised learning approach I was told. You, the customer, first train the system then can use that training on subsequent designs.
Kaushik talked about accelerating debug using machine learning. This I thought was a very cool application since it builds on unsupervised learning to identify clusters of related problems, unlike many ML applications which rely on supervised learning to identify specific object matches. This is particularly useful in static UPF analysis which can generate hundreds of thousands of errors. But there aren’t really anywhere near that many root-cause bugs; instead each real bug spawns many symptoms. Using unsupervised learning (with no doubt a good deal of secret sauce) can massively reduce the debug effort. Kaushik showed one example, resulting from a level-shifter error, where a huge number of reported errors and warnings could be traced back to just two problems. Way easier than the traditional approach.
You can learn more about what Synopsys is doing in low-power HERE.