Presenters took a trip down memory lane at DAC this year by having a panel discussion on HLS (High Level Synthesis) spanning from 1974 to 2020, and that time period aligns with when I first graduated from the University of Minnesota in 1978, starting chip design at Intel, then later transitioning into EDA companies by 1986. Marilyn Wolf from the University of Nebraska organized the panel, but then at the last minute had Yuan Xie from UCSB moderate. Using Zoom for the live panel had a few glitches, but Yuan kept things moving when a presenter sporadicly disappeared.
Bryan Bowyer from Mentor was the first to present, and the earliest High Level Synthesis tools like Monet were only well-suited for datapath architectures. The first languages were VHDL and Verilog, then eventually moved into C++ and SystemC code. Today with open-source class-based libraries the HLS world is quite changed and simpler than ever to use, practical for processors, NOC, bus interfaces, even RISC-V chips. Mentor’s HLS tool is Catapult.
Deming Chen, a professor at UIUC also works at a startup called Inspirit IoT. He saw how HLS came to the rescue offering 10X code reduction and 1,000X simulation speed ups. RTL thrived during 1985-2010, however HLS came into it’s own in 2000 with Forte, followed by tools from Cadence, Mentor, even NEC by 2005 and Synopsys in 2009. Here’s a sneak peek into the approach with Inspirit IoT:
From Cadence was Sean Dart who showed how High Level Synthesis tools started in 1996 with DASYS RapidPath, 2001 with Forte Cynthesizer, 2007 saw Cadence C-to-Silicon, and by 2015 the Stratus HLS tool which combined the best features of Cynthesizer and C-to-Silicon. Present day HLS tools are being used for quite a wide range of SoCs, like: AI, 5G, auto vision, WiFie, Imaging, cameras, networking, GPU and more. Dart believes that HLS tools have to solve meaningful problems, include a world-class scheduler, have QoR comparable to hand-coded RTL, integrate with physical implementation tools, be easy to use for engineers, and have existing tool flow support.
The presenter from Synopsys was Pierre Paulin, joining them in 2013, Issues with HLS started right away, because in the 90s there was no standard input language. Application Specific Instruction set Processors (ASIP) came about around 2000 using standard cores plus extensions. By 2015 AI was being applied to HLS using Convolutional Neural Network (CNN) graphs, then TensorFlow and ONNX graphs became standards. Domain-specific accelerators for AI and neural networks have just exploded since 2017, powered by HLS tools.
Wakabayashi-san from the University of Tokyo gave a history of how the CyberWorkBench tool emerged from inside of NEC to become a commercial High Level Synthesis tool. NEC did a commercial chip for SONET controller back in 1994, and by 1999 they had a cycle-accurate HLS ability and designed a NIC. C-based verification came out in 2000, and an automatic pipeline featured debuted in 2002. Visitors to DAC in 2006 heard about CyberWorkBench for the first time.
Q: What is the benefit of using HLS versus traditional RTL design?
Wakabayashi: The total quality is good with HLS.
Dart: A CDNLive user said, “Our company has zero tolerance for PPA that is worse than hand-coding.” With HLS you have enough time to actually explore alternative architectures.
Bowyer – Over the past 20 years we’ve added the best HW design experience to our HLS tools, so the area results are matching hand-coded designs. For low power designs HLS results are better than hand-coding.
Paulin – Exploration is the key benefit of HLS.
Chen – Exploration and migration are huge benefits of HLS, our startup is even using ML to get better HLS results
Q: What key domains are HLS tools being used in?
Dart – We see HLS being used in AI, WiFi, 5G, modems, image processing, cameras.
Bowyer – Hardware accelerators are good HLS applications, DSP, AR< 5G, WiFi.
Wakabayashi – The H.265 standard, and any complex algorithm are ideal for HLS approach.
Chen – Accelerator chips do best in HLS methodology.
Q: Today we see C, C++ and SystemC used for HLS, but what about other languages?
Bowyer – We’ve just recently standardized on SystemC, so it’s kind of early and impractical to talk about using newer languages.
Dart – I agree, it’s not just the HLS language, it’s the whole tool environment, debug, verification, so all of those need to be connected, design and verification flows.
Chen – Even C code can embed assembly code, so with HLS it’s possible to go higher than C++ and SystemC. A Python driven HLS tool could happen.
Wakabayashi – Python could be used for HLS, but then you need new libraries to really be efficient. In HLS the linked list is not handled well now, but it’s a good research area.
Chen – We’re not a big three EDA vendor, so we’re focusing FPGA devices, doing some ML code analysis, have multi-cycle support, and are working with customers to make them happy first. We’re looking to be acquired by a bigger company.
Q: What are new research directions in HLS for those graduate students listening?
Chen – Dealing with very large designs, because HLS is kind of domain-specific and used more at block-level.
Bowyer – HLS research for AI like data flow at a higher level of abstraction.
Paulin – HLS should try TensorFlow as an input language, then compile that into both HW and SW.
Wakabayashi -We want intelligent, automatic exploration across the entire PPA space.
Dart – HLS could improve on scalability and power. Being able to optimize across an entire design is a good research area. Oh, and any kind of HLS training that students have in school will make for a bright career right now.
In my IC design career I’ve witnessed transistor-level full-custom design, gate-level design, RTL design, and more recently HLS design. The shift towards HLS design methodology is growing steadily, because it works while providing outstanding productivity, and there’s a robust commercial marketplace with multiple vendors to choose from. The bigger the EDA vendor, the more likely that they’ve integrated HLS with physical implementation tools, something quite important for the smallest node designs.
If you registered for DAC and have access to the archived videos, find this panel presentation that runs for 1 hour and 13 minutes.