Last week, Cadence hosted its annual CadenceLIVE Americas 2021 conference. Four keynotes and eighty-three different talks on various topics were presented. The talks were delivered by Cadence, its customers and partners.
One of the keynotes was from Partha Ranganathan, VP and Engineering Fellow from Google. His talk was titled, “May I have More Moore Please? Thinking Outside the Traditional Hardware Box.” Glancing at the agenda ahead of the conference, the “More Moore” part of the title caught my attention. For many decades, Moore’s law was doubling performance of devices and reducing costs every two years or so. Over the recent past, Moore’s law has started slowing, while the demand for performance increases has been growing rapidly. With so much talk about Moore’s law slowing down, what does Partha mean by “More Moore Please?”
What he presented was indeed out-of-the-box thinking and outside of the traditional hardware box as his talk title indicated. The following is a summary of what I gathered from his keynote.
Partha starts off by emphasizing that in addition to Moore’s law slowing down, the demand for performance has been growing very rapidly, increasing the severity of the gap. As examples to substantiate the claim, he references YouTube video uploads, which have been growing exponentially over time, emerging machine learning (ML) workloads demand that has been doubling every 3.5 months and cloud computing data traffic growth.
He then draws our attention to security challenges that add significant performance overhead on products and calls that negative Moore’s law. Having showcased the growing gap between performance provided (supply) and performance demand that traditional Moore’s law is struggling to close, he uses the rest of his talk to discuss ways to close that gap. In essence, he recommends a combination of efficient design of hardware, efficient use of hardware, and time incremental deployments of performance enhancements addressing a continuum of opportunities. Through this approach he suggests that more than Moore’s law kind of performance benefits could be derived at the system level.
Efficient Hardware Design
Custom silicon accelerators are simply hardware targeted to specific types of workloads and by taking this approach, the best solution in terms of performance, cost and power is achievable. But performance acceleration involves more than just silicon/hardware. That needs to involve the ecosystem made of firmware, drivers, compilers, debuggers/tracing/monitors, etc., We should look at the entire system that includes compute, memory, networking, and storage. Making acceleration usable requires system software, firmware, applications and hardware/software co-design.
Based on Google’s experiences, successful accelerator projects were those that focused not just on acceleration efficiency (reduced cycles) but also on what exactly was being improved and what impact that would have on the entire system and the user experience.
Efficient Use of Hardware
With workloads changing so very dynamically nowadays in terms of type and duty cycle, efficient use of hardware becomes very critical. The concept of disaggregating the total pool of hardware resources into compute, memory and storage and dynamically configuring them for the workloads that yield the best performance. This is where cloud computing, through its software-defined hardware, could be leveraged.
Time Incremental Performance Deployments
Moore’s law has three variables, performance, cost and timeline. The timeline was 2-year cycles for doubling of performance and halving of cost. We all have experience benefitting from continuous improvement cycles in the software industry where incremental enhancements are released on an ongoing basis through SaaS deployments. Thinking out-of-the-box and coming up with incremental enhancements in hardware acceleration delivered and deployed over shorter time intervals could be a way to close the performance supply-demand gap.
Continuum of Opportunities
Instead of focusing almost exclusively on quantum improvements to a small percentage of the whole set of opportunities, doing incremental improvements to a majority of the set could yield the same or even a better overall performance improvement. For example, instead of focusing just on the applications for performance enhancements, you should focus also on core libraries and systems infrastructure, which have a long tail.
Attack of the Killer Microseconds
Computer systems have traditionally been optimized for nanosecs tasks and millisecs tasks, the two extreme end task types. Nanosecs tasks at the hardware-level that includes pipelining, out-of-order execution, pre-fetching, etc. Millisecs at the system-level that includes task scheduling, context-switching and so on. There are a number of microsecs-type tasks that deserve to be looked at for acceleration. For example, a full, fast networking hop across a data center is one microsec. How much would acceleration of this hop yield in terms of overall system and workload performance? In today’s computing world, there are a number of microsecs-type tasks.
Partha closes his keynote with references to some of the projects Google is collaborating with Cadence on. Those projects include:
- Advanced-node engagements with foundries and on foundational IP
- Co-design and custom silicon projects
- New licensing models for software enablement for Google Cloud
- Multi-physics modeling for system/thermal, next-gen packaging technology and chiplets