DAC2025 SemiWiki 800x100

A New Class of Accelerator Debuts

A New Class of Accelerator Debuts
by Bernard Murphy on 07-22-2024 at 6:00 am

Chimera GPNPU Block diagram

I generally like to start my blogs with an application-centric viewpoint; what end-application is going to become faster, lower power or whatever because of this innovation? But sometimes an announcement defies such an easy classification because it is broadly useful. That’s the case for a recent release from Quadric, based on an architecture which seems to carve out a new approach to acceleration. This is able to serve a wide range of applications, from signal processing to GenAI with depth in performance, up to 864 TOPs per their announcement.

The core technology

Quadric’s roots are in AI acceleration, so let’s start there. By now we are all familiar with the basic needs for AI processing: a scalar engine to handle regular calculations, a vector engine to handle things like dot-products, and a tensor engine to handle linear algebra. And that’s how most accelerators work – 3 dedicated engines coupled in various creative ways. The Quadric Chimera approach is a little different. The core processing element is built around a common pipeline for all instruction types. Only at the compute step does it branch to an ALU for scalar operations or a vector/matrix unit for vector/tensor operations.

Both signal processing and AI demand heavy parallelism to meet acceptable throughput rates, handled through wide-word processing, lots of MACs and multi-core implementations. The same is true for the latest Quadric architecture, but again in a slightly different way. Their new cores are built around systolic arrays of processing elements, each supporting the same common pipeline, each with its own scalar ALU, bank of MACs and local register memory.

This structure, rather than a separate accelerator for each operator class, has two implications for product developers. First it simplifies software development, still highly parallel to be sure, but abstracting out a level of complexity in multi-engine accelerator architectures where operations must be steered to the appropriate engines.

Second, the nature of parallelism in transformer-based AI models (LLMs or ViT for example) is much more complex than for earlier generation ResNet-class accelerators which process through a sequence of layers. In contrast, transformer graphs flip back and forth between matrix, vector and scalar operations. In disaggregated hardware architectures traffic flows similarly must alternate between engines with inevitable performance overhead. In the Quadric approach, any engine can handle a stream of scalar, vector and tensor operations locally. Of course there will be overhead in traffic between PE cores, but this applies to all parallel systems.

Steve Roddy (VP Marketing for Quadric) tells me that in a virtual benchmark against a mainstream competitor, Quadric’s QC-Ultra IP delivered 2X more inferences/second/TOPs for a lower off-chip DDR bandwidth and at less than half the cycles/second of the competing solution. Quadric are now offering 3 platforms for the mainstream NPU market segment: QC Nano at 1-7 TOPs, QC Perform at 4-28 TOPs, and QC Ultra at 16-128 TOPs. That high end is already good enough to meet AI PC needs. Automotive users want more, especially for SAE-3 to SAE-5 applications. For this segment Quadric is targeting their QC-Multicore solution at up to 864 TOPs.

All these platforms are supported by the proven Chimera SDK. Steve had an interesting point here also. AI accelerator ventures will commonly mention their “model zoos”. These are standard AI models adapted through tuning to run on their architectures. Like function libraries in the conventional processor space. As for those libraries, model zoo libraries must be optimized to take full advantage of their architectures. By implication a new model requires the same level of tuning, a concern for new customers who must depend on the AI developer to handle that porting for them, each time they add or refine a model.

In contrast, Steve says Quadric already hosts hundreds of models on their site which simply compile without changes onto their platforms (you can still tune quantization to meet your specific needs). It’s not a model zoo, but simply a demonstration that their SDK is already mature enough to directly map a wide class of models without modification. And he notes that if your model needs an operator outside the ONNX set they already support, you can simply define that operator in C++, just as you would for say an NVIDIA accelerator.

Applications and growth

Quadric is a young company, shipping their first IP just over a year ago. Since then, they can already boast a handful of wins, especially in automotive. Customer names of course are secret, but DENSO is an investor of record. Other customer wins are in domains that reinforce the general-purpose value of the platform, in traditional camera functions, perhaps also in femtocell basebands (for MIMO processing). These two cases may or may not need AI support, but they do heavily lean on the DSP value of the platform.

This DSP capability is itself pretty interesting. Each PE can handle a mix of scalar and vector operations – up to 32b integer or 16b float – and these can be paralleled across up to 1024 PEs in a QC Ultra. So you can serve your immediate signal processing needs with high-end DSP word widths and add transformer-grade functionality to your engine later.

Sounds like a new breed of accelerator engine to me. You can learn more HERE.

Also Read:

2024 Outlook with Steve Roddy of Quadric

Fast Path to Baby Llama BringUp at the Edge

Vision Transformers Challenge Accelerator Architectures


Podcast EP236: Why Comprehensive Development Support for AI/ML is Important with Clay Johnson

Podcast EP236: Why Comprehensive Development Support for AI/ML is Important with Clay Johnson
by Daniel Nenni on 07-19-2024 at 10:00 am

Dan is joined by Clay Johnson, CEO of CacheQ. Clay has decades of executive experience in computing, FPGAs and development flows, including serving as Vice President of the Xilinx Spartan Business Unit which was acquired by AMD.

Clay discusses the changes occurring in system design to leverage AI/ML and technologies such as large language models. Clay points out that enabling these changes doesn’t end with the development of a new chip that performs AI algorithms faster.

Rather, the availability of a comprehensive development environment to integrate new technologies into existing systems becomes the key enabler to progress. Clay describes several examples of this trend.

CacheQ’s heterogeneous development platform enables easy development, deployment and orchestration of applications across multiple cores and heterogeneous distributed compute architectures. This results in significant increases in application performance and a dramatic reduction in development time.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: Orr Danon of Hailo

CEO Interview: Orr Danon of Hailo
by Daniel Nenni on 07-19-2024 at 6:00 am

Orr Danon CEO Hailo

Orr Danon is the CEO and Co-Founder of Hailo. Prior to founding Hailo, Orr spent over a decade working at a leading IDF Technological Unit. During this time he led some of the largest and most complex interdisciplinary projects in the Israeli intelligence community. For the projects he developed and managed, Danon received the Israel Defense Award from the president of Israel, and the Creative Thinking Award from the Head of the Military Intelligence. Danon holds a B.Sc., Physics & Mathematics from the Hebrew University as part of the ”Talpiot” program and an M.Sc. in Electrical Engineering (cum laude) from the Tel Aviv University.

Tell us about your company?
Hailo is an edge AI-focused chipmaker. We develop specialized AI processors that enable high performance machine learning applications on edge devices such as NVRs, cameras, personal computers, vehicles, robots and more.

Hailo’s current key offerings include the Hailo-8 AI accelerator, which allows edge devices to run deep learning applications at full scale more efficiently, effectively, and sustainability; the Hailo-15, vision processor that can be placed directly into next generation of intelligent cameras, and the Hailo-10 GenAI accelerator, which empowers users to operate Generative AI locally and minimize reliance on cloud-based platforms.

What problems are you solving?
The Hailo AI processors bring data-center class performance to edge devices, enabling processing of advanced deep learning models in real-time and high accuracy, at a very low power consumption and attractive cost. Users can now run sophisticated AI tasks such as object detection, image enhancement, and content creation on edge devices without compromising on cost – solving previous issues with AI at the edge.

What application areas are your strongest?
We see a number of key application areas, including security, automotive, personal computers, and industrial automation.

Hailo is already serving more than 300 customers in these market segments.

Earlier in the year we announced that our Hailo-8 AI accelerator has been chosen alongside the Renesas R-Car V4H SoC to power the iMotion iDC High domain controller, advancing the future of autonomous driving. A Chinese automaker is expected to begin mass production with the domain controller in the second half of this year.

Additionally, we announced in June that Raspberry Pi had selected Hailo to provide AI accelerators for the Raspberry Pi AI Kit, the computing company’s AI-enabled add-on for Raspberry Pi 5. The partnership will empower both professional and enthusiast creators to elevate their projects and solutions in home automation, security, robotics and beyond, with advanced AI capabilities.

What keeps your customers up at night?
Our customers are concerned with ensuring high quality machine learning and AI services independently of network connectivity, and they’re concerned with their AI empowerment offering a strong performance-to-cost ratio and performance-to-power consumption ratio.

Another aspect which customers are always concerned about is the software tools which we as a silicon company provide. AI is a rapidly developing field, and the ability to respond fast to the dynamic market environment in which our customers operate depends heavily on the quality of the software toolchain, its documentation and of course the support we provide to them.

What does the competitive landscape look like and how do you differentiate?
Hailo is the only chipmaker who designed a processor specifically for running AI applications on edge devices, taking into consideration factors like cost, size, power consumption and memory access. Other AI processors, such as GPUs were not designed to run edge AI applications, and are therefore more costly and power consuming.

Additionally, Hailo is the only chipmaker who is offering a full range of AI processors at the single-digit Watt range – from accelerators that operate as co-processors that handle the AI models only, to full blown camera SOCs that handle both vision processing and AI video enhancement and analytics, all with a single, robust software suite that allows developers to use the same applications on different platforms.

What new features/technology are you working on?
We recently announced a $120M extended Series C fundraising round, which will be used for continued research and development, and the Hailo-10 generative AI accelerators that unlock the power of GenAI on edge devices, such as personal computers, smart vehicles, and commercial robots, Hailo-10 allows users to completely own their GenAI experiences, making them an integral part of their daily routine.

How do customers normally engage with your company?
To support the thousands of AI developers using Hailo devices, and to accommodate the growing Hailo community, we recently introduced an online developer community featuring tutorials, FAQs, and other resources to foster innovation among creators and developers. Registered members will have the opportunity to engage with a team of Hailo experts and connect with each other to share code, experiences, resources, knowledge, and more.

Visit https://hailo.ai/ for more information about our products, solutions and latest case studies or contact us here.

Also Read:

CEO Interview: David Heard of Infinera

CEO Interview: Dr. Matthew Putman of Nanotronics

CEO Interview: Dieter Therssen of Sigasi


Has ASML Reached the Great Wall of China

Has ASML Reached the Great Wall of China
by Claus Aasholm on 07-19-2024 at 6:00 am

ASML Holdings 2024

Is it time to abandon the ASML stock?

The first tool company to report Q2-24 results is ASML, and the lithography leader delivered a result above the guidance of EUR5.95B. Revenue of EUR6.242B is 4.9% above guidance and 18% above last quarter’s result of EUR5.29B.

Both operating profit and gross profit grew but not to the level of the end of last year. ASML management calls 2024 a transition year in investor communications, indicating a stronger 2025.

Tool revenue increased after a significant dip. Service Revenue is much more resilient than tool revenue, as it is dependent on the installed base of tools.

Almost all of the tool revenue growth came from memory tool sales, indicating that the memory companies are finally ready to make substantial investments in new capacity, which is much needed after the shift to HBM production.

From a product perspective, the short-term trend of EUV revenue decline continued while the immersion product sales were solid.

Immersion is a technique that utilises that light through water, resulting in amplification, allowing better resolution at the same light wavelength.

Given the Chips Act and other subsidies, the ASML result is somewhat counter-intuitive as EUV is used for 3-7nm leading-edge manufacturing nodes, and immersion is used for 7-14nm. Given the US attempt to become a leading-edge manufacturing location, it could be expected that leading-edge tools would dominate revenue. This indicates that the new factories are not yet in the tooling phase.

The other significant consumer of leading-edge tools is TSMC, which reported Q2-24 result right after ASML.

 

Although Capex spending was up, it was still just slightly above the maintenance investment level—the investment needed to maintain the deterioration of the existing manufacturing assets. TSMC is likely waiting for ASML’s High-NA tool to be available. ASML has confirmed they shipped one of these babies last quarter and installed another in Veldhoven on the joint IMEC/ASML manufacturing line. The tool is priced North of $350M, and ASML is trying to reach a production capacity of 20 systems annually during the 24/25 timeline.

Despite beating the guidance and reasonable growth, the ASML share price plunged in the stock market. Are the markets losing confidence in the Lithography leader?

What about China?

The key reason for the decline is the ASML result coincided with news that further export limitations are in the works.

Since the signing of the Chips Act, tool sales to China have exploded. While this could be expected, it seems like the US administration’s patience has run out.

The Chinese companies have not had access to the EUV systems since 2019, and the latest embargo, which began on September 23, banned sales of the immersion systems. This makes 80% of ASML’s products (from a revenue perspective) unavailable for Chinese customers.

As ASML has been allowed to ship the backlog, the effect has been delayed, and China still accounted for 49% of all tool sales in Q2-24.

This, however, is about to end abruptly as the Chinese backlog has been depleted.

The ASML backlog now reflects the embargo revenue view, and from now on, the Chinese revenue will fall to 20% of the total from the current level of 49%.

The potential new embargo will impact ASML’s service revenue, which is currently 24% of total revenue. Under a potential new embargo, ASML can lose the ability to service its Chinese customers, which is incredibly important for keeping the tools alive and productive. As the Chinese manufacturing base could deteriorate fast, this could create new opportunities for ASML as mature node capacity would grow outside China.

The longer-term view

With the likely dip in China business and a potential embargo impacting service revenue, investors are starting to panic and run away from ASML. It is worth noting that this is an amazing company founded on a philosophy of long-term cooperation with its suppliers and other stakeholders. Constant innovation drives higher productivity and tool pricing a reaching an alarming (for customers) increasing in price.

While each tool increases productivity, it is still a hefty price if you want to be at the bleeding edge of Semiconductor manufacturing.

The current ASML manufacturing plan will enable the company to deliver a 20B$+ quarter (at current pricing) at the end of 2026. This is not a given or a forecast and can be changed according to industry development. However, it is a very strong indication that the company has faith in the long-term future of the current strategy.

Our research is focused on the business results and not on investment advice. However, if you have faith in the long-term plan of ASML, it might be too early to dump ASML shares.

Also Read:

Will Semiconductor earnings live up to the Investor hype?

What if China doesn’t want TSMC’s factories but wants to take them out?

Blank Wafer Suppliers are not Totally Blank


Blue Cheetah Advancing Chiplet Interconnectivity #61DAC

Blue Cheetah Advancing Chiplet Interconnectivity #61DAC
by Daniel Payne on 07-18-2024 at 10:00 am

blue cheetah 61dac min

At #61DAC, I love it when an exhibitor booth uses a descriptive tagline to explain what they do, like when the Blue Cheetah booth displayed Advancing Chiplet Interconnectivity. Immediately, I knew that they were an IP provider focusing on chiplets. I learned what sets them apart is how customizable their IP is to support specific physical and system bandwidth requirements, how the interconnect IP is configured for cost-sensitive or high-performance cases, how the energy and performance are optimized from 32 Gb/s down to 8Gb/s and lower, being process-ready at nodes from 16nm to 3nm, and finally having been silicon-proven with reference board designs. I sat down with John Lupienski, VP Product Engineering at Blue Cheetah, to better understand what they were all about. John’s background covers roles at Cadence, Broadcom, and Motorola.

Blue Cheetah at #61DAC

Chiplet designers can opt for an industry-standard interconnect, such as UCIe or BOW, or something custom; Blue Cheetah supports either approach. Blue Cheetah is active with the emerging chiplet standards and is an active participant of both organizations. Smaller IO core area, lower energy per bit, tailor-fit designs are compelling reasons to talk with this IP vendor. The company can customize its IP links per each unique application and deliver solutions using advanced process technologies across multiple foundries and supporting standard and advanced packaging technologies. Its IP has been used in tape-outs for chiplet interconnects ranging from 16nm down to the 4nm node.

During DAC,  Baya Systems and Blue Cheetah announced their combined chiplet-optimized Network on Chip (NoC) and Physical Layer (PHY) interconnect IP offerings, making it easier and less risky to design with chiplets. Tenstorrent, announced in February that it uses the Blue Cheetah die-to-die interconnect IP for its AI and RISC-V products. Tenstorrent recently announced that it also uses Baya Systems’ NoC fabric IP.

The demonstration at the booth showed test packages integrating 12nm chiplets (availability announced in May 2023) with channel lengths spanning 2mm up to 25mm. Blue Cheetah’s customers’ develop products for a wide variety of end markets; in addition to Tenstorrent, publicly known examples of Blue Cheetah’s customers and partners include DreamBig Semiconductor, FLC, and Ventana Microsystems.

Blue Cheetah test chip, various channel lengths

The architecture of the interconnect IP is modular, making it quicker to port to newer process nodes. John mentioned that packaging for chiplets requires an engineer to perform SI/PI analysis, as customers often use an OSAT to assemble, and each chiplet can be fabricated at different nodes, so you really want interconnect IP that has been silicon-proven. To help get you started with chiplets, they offer reference boards and software to speed up the learning curve.

Summary

SoCs have been around for decades, while the trend of using chiplets has just started in the last several years. Blue Cheetah is a trailblazer in the industry and has solidified its position with high-speed, low-latency, power-efficient D2D BlueLynx™ interface products. The company’s standards-based and customizable IP solutions are available now in 16nm,12nm, 7nm, 6nm, 5nm, 4nm, 3nm, and below across multiple semiconductor foundries.

You can follow up with John directly or contact the company on its website for more info. The company appears at many events throughout the year, including DAC, Chiplet Summit, ISSCC, OCP Global Summit, SemIsreal Expo, and foundry events.

Related Blogs


The China Syndrome- The Meltdown Starts- Trump Trounces Taiwan- Chips Clipped

The China Syndrome- The Meltdown Starts- Trump Trounces Taiwan- Chips Clipped
by Robert Maire on 07-18-2024 at 8:00 am

China Syndrome
  • The chip industry got a double tap of both China & Taiwan concerns
  • Bloomberg reported the potential for draconian China chip restrictions
  • Trump threw Taiwan under the bus demanding “protection money”
  • Over-inflated chip stocks had a “rapid unscheduled disassembly”
US looking to further restrict ASML & Tokyo Electron

It has been reported by Bloomberg that the US is going to crack down further on chip equipment sales.

Unfortunately the main targets appear to be non US semiconductor equipment companies such as ASML & Tokyo Electron rather than US equipment companies which sell a similar percentage of their wares to China.

Link to article on China restrictions

The US government is obviously punishing foreign firms more than US firms, Such as AMAT, LRCX & KLAC that are doing the same thing. Perhaps not wanting to hurt US companies…..or perhaps the government is finally realizing their efforts haven’t worked and will finally crack down on US based sales to China.

We mentioned in our note last week about the tens of millions of dollars being spent lobbying the government on behalf of US equipment companies…..maybe its not enough or the government is finally realizing they need to do more

Foreign Direct Product rule

….says that the US can restrict foreign companies, like ASML & Tokyo Electron from selling and servicing equipment that contains US technology.

Foreign Direct Product rule link

ASML famously bought Cymer, a US company in San Diego for their DUV & EUV sources.

Most investors don’t know that Cymer had a lot of “star wars” defense industry technology involving high power lasers and that ASML had to get permission from US defense related officials in order to acquire Cymer. Any agreements ASML made in order to achieve permission were never publicly released, but we would imagine the US government retained some sort of influence

The government is likely as concerned about chip technology as well as high power laser technology

Tokyo Electron does a lot of R&D in the US (as does ASML), so we are sure their products contain US technology in many places….its impossible to avoid

That giant “sucking sound”

We had mentioned in our note last week that US equipment companies would be “sucking major wind” if they lost the 40% plus of their sales which go to China.

But its much worse than it appears on the surface. US chip equipment companies charge Chinese companies a whole lot more than TSMC or Samsung, so the margins are much higher on that 40% plus than of the 50%+ non China sales.

We would not be surprised if closer to 60% or more of profitability comes from China sales. Thus losing China sales has an oversized impact on the bottom line.

US semiconductor equipment companies could actually lose money for the first time in many years if China sales were curtailed enough…..it could get very ugly very fast….

The Mafia “Don” wants “protection money” from Taiwan

Having been born and raised in New York we were very familiar with local establishments paying “protection money” to organized crime types to prevent something bad from happening…….

You can imagine the phone call from the US to Taiwan….” nice little island you got there, you wouldn’t want anything bad to happen to it, would you?”, “cut us in for 20 percent of the action on those chip things you make….”

This scenario is not as far fetched as it would sound as Donald Trump today suggested that the US might not defend Taiwan if they didn’t pay the US for that “protection”….so much for helping out friends and allies….obviously Ukraine will get a similar message.

This statement threw gasoline on an already raging China restriction issue that had the chip stocks in turmoil already.

If the US restricts China sales and China blockades Taiwan at Trumps invitation, equipment sales at the number one and number two markets are at risk……a very bad day…..

The Stocks

…were obviously crushed today on this double whammy of news.

Its not like the stocks were at low valuations to begin with. We have pointed out time and again that the stocks were overheated and over extended. We certainly think AI is the greatest thing in technology ever, but a lot of unrelated chip and chip equipment names got run up in the tsunami.

We will likely see a near term valuation reset across many names in the semi space.

Final valuations and impacts will not truly be known until the US actually publicly states what’s going on and how bad the damage will be. Until then it will be a guessing game but just guessing how bad the impact will be as its all negative.

Initially it will be ASML & TEL but we think this time US companies will likely finally feel some pain as well…..we just don’t know how much it will hurt……

About Semiconductor Advisors LLC

Semiconductor Advisors is an RIA (a Registered Investment Advisor),
specializing in technology companies with particular emphasis on semiconductor and semiconductor equipment companies.
We have been covering the space longer and been involved with more transactions than any other financial professional in the space.
We provide research, consulting and advisory services on strategic and financial matters to both industry participants as well as investors.
We offer expert, intelligent, balanced research and advice. Our opinions are very direct and honest and offer an unbiased view as compared to other sources.

Also Read:

SEMICON West- Jubilant huge crowds- HBM & AI everywhere – CHIPS Act & IMEC

KLAC- Past bottom of cycle- up from here- early positive signs-packaging upside

LRCX- Mediocre, flattish, long, U shaped bottom- No recovery in sight yet-2025?


Evolution of Prototyping in EDA

Evolution of Prototyping in EDA
by Daniel Nenni on 07-18-2024 at 6:00 am

Picture I

As AI and 5G technologies burgeon, the rise of interconnected devices is reshaping everyday life and driving innovation across industries. This rapid evolution accelerates the transformation of the chip industry, placing higher demands on SoC design. Moore’s Law indicates that while chip sizes shrink, the number of transistors increases rapidly. It is hard to imagine achieving such highly integrated, large-scale designs without advanced EDA tools.

Tape-out is a critical and high-risk phase in chip design. Even a minor error can lead to significant financial losses and missed market opportunities. Logic or functional errors account for nearly 50% of tape-out failures, with design errors comprising 50%-70% of these functional defects. Therefore, verification of SoC design is crucial to successful tape-out. SoC verification is highly complex, taking up about 70% of the entire cycle. To accelerate time-to-market, system software development and pre-tape-out verification must be conducted concurrently, highlighting the significant advantages of prototyping.

For large-scale SoC designs, traditional software simulations often fall short due to the slow execution speed. Consequently, prototyping and hardware simulations have emerged as the primary verification methods, with high-performance prototyping taking the lead. Prototyping, particularly FPGA-based, can be thousands to millions of times faster than software simulations. It is more cost-effective and faster than hardware simulations, making it indispensable for verifying complex SoCs. However, manually built prototyping platforms are difficult to maintain and scale in multi-FPGA and complex design environments. This method is time-consuming and prone to errors, leading to increased risks of project delays and cost overruns. Commercial prototyping solutions have thus emerged to address these challenges.

The Birth of Commercial Prototyping

In 1992, Aptix, the pioneer in the prototyping area, launched the System Explorer system, utilizing FPGAs and custom interconnect chips to achieve commercial prototyping. In subsequent years, projects such as Transmogrifier-l from the University of Toronto, AnyBoard from North Carolina State University, Protozone from Stanford University, and BORG from the University of California, Santa Cruz, explored ways to implement HDL chip designs on prototyping boards. Although these projects were not ready for large-scale commercialization, Aptix’s success inspired other vendors to spark interest in this field. Despite later being absorbed in mergers, Aptix’s pioneering contributions to chip verification methodology remain historically significant.

In 2003, Toshio Nakama founded S2C in San Jose, California, after departing from Aptix. At DAC 2005, S2C unveiled its first prototyping product, the IP Porter, and soon launched the commercially successful Prodigy series. This marked a new era for the company, positioning S2C as a leader in rapid SoC prototyping solutions. Concurrently, the Dini Group in the US released its first commercial FPGA prototyping system, the DN250k10, based on six Xilinx XC4085 FPGAs, providing a flexible and cost-effective solution for design teams. Around the same period, Sweden’s HARDI Electronics AB launched its first FPGA-based prototyping system, HAPS, using Xilinx Virtex FPGAs.

Rapid Growth Driven by Competition

In 2008, Synopsys entered the prototyping market by acquiring Synplicity for $227 million, marking the start of a rapidly growing and competitive era for prototyping. Synopsys spent nearly four years integrating the technology, eventually releasing the HAPS-70 series, a fully automated prototyping product. This acquisition significantly grew the prototyping market, previously dominated by software and hardware simulation tools​.

Cadence soon followed suit. Historically focused on designing its FPGA boards, Cadence faced challenges until it acquired Taray in March 2010. Taray’s pioneering routing-aware pin assignment technology optimized FPGA design with the circuit board, aiding in the development of a robust prototyping platform. Cadence later collaborated with the Dini Group to develop the Protium prototyping product. However, Dini Group was acquired by Synopsys on December 5th, 2019. Today, Cadence focuses on streamlining the integration between its prototyping and hardware simulation products, ensuring seamless connectivity​.

Siemens EDA (formerly Mentor Graphics acquired in 2016), had a turbulent history in prototyping. In the late 1990s, Siemens EDA licensed emulation technology from Aptix but faced several challenges. To enhance its timing-driven and multi-FPGA partitioning capabilities, Siemens EDA acquired Auspy and Flexras Technologies, the latter known for its “Wasga” automatic partitioning software. In June 2021, Siemens EDA further strengthened its prototyping portfolio by acquiring PRO DESIGN’s proFPGA product series​.

The entry of these major companies, along with providers like S2C, facilitated the shift from software and hardware simulation to automated prototyping solutions, enhancing the efficiency and accuracy of SoC designs, and paving the way for further innovations in the entire EDA industry.

Major Challenges and Solutions in Prototyping

The emergence of innovative prototyping solutions has driven increased complexity in SoC design and heightened demands for rigorous prototyping. These solutions require specialized expertise to manage design partitioning, mapping, interface and communications with external environments, debugging, and performance optimization. Consequently, prototyping has become a high-barrier field with only a few EDA companies maintaining a leading position. Some companies even rely on continuous mergers to strengthen their market presence.

As a leader in prototyping, S2C addresses challenges in multi-FPGA RTL logic partitioning, interconnect topology, IO allocation, and high-speed interfaces by timing-driven RTL partitioning algorithms and built-in incremental compilation algorithms. S2C continually updates hardware configurations to support more FPGAs and offer higher-performance connectors ensuring its technology remains at the industry’s forefront.

With over 20+ years of industry experience and a relentless commitment to innovation, S2C equips clients with the highly trusted tools necessary to stay ahead in the competitive market. Their comprehensive solutions accelerate time-to-market, offering unparalleled speed, accuracy, and reliability.

Also Read:

S2C Prototyping Solutions at the 2024 Design Automation Conference

Accelerate SoC Design: DIY, FPGA Boards & Commercial Prototyping Solutions (I)

Accelerate SoC Design: Addressing Modern Prototyping Challenges with S2C’s Comprehensive Solutions (II)

S2C and Sirius Wireless Collaborate on Wi-Fi 7 RF IP Verification System


How Sarcina Revolutionizes Advanced Packaging #61DAC

How Sarcina Revolutionizes Advanced Packaging #61DAC
by Mike Gianfagna on 07-17-2024 at 10:00 am

DAC Roundup – How Sarcina Revolutionizes Advanced Packaging

#61DAC was buzzing with discussion of chiplet-based, heterogeneous design.  This new design approach opens new opportunities for applications such as AI, autonomous driving and even quantum computing. A critical enabler for all this to work is reliable, cost-effective advanced packaging, and that is the topic of this post. Sarcina Technology is a company focused on delivering reliable, cost-effective advanced packaging through a palette of advanced engineering services. You can learn more about this unique company on SemiWiki here. Let’s see how Sarcina revolutionizes advanced packaging at #61DAC.

The Keysight Connection

As DAC becomes the chips to systems conference, Keysight Technologies becomes more relevant at the show. In its own words, Keysight is your innovation partner, delivering market-leading design, emulation, and test environments that help you develop and deploy faster, with less risk, throughout the product life cycle. Keysight had a partner theater in its booth at #61DAC, and Sarcina’s CEO, Larry Zu was there to explain how Sarcina revolutionizes advanced packaging with Keysight’s help.

The focus of Larry’s presentation was around something called Bump Pitch Transformer package design. A Bump Pitch Transformer for high I/O interconnect density between adjacent dice is a silicon bridge technology that replaces expensive silicon TSV interposers with more cost-effective re-distribution layers (RDL).  It’s ideal for homogenous and heterogenous chiplet integration targeting high-performance computing (HPC) devices for AI, data center, microprocessor, and networking applications.  

FOCoS B from ASE

Larry explained that Fan-Out Chip-on-Substrate with Silicon Bridge (FOCoS-B) is the latest Bump Pitch Transformer technology. Armed with production-level package design rules and the know-how to design advanced FOCoS-B package with this technology, today Sarcina Technology is ready to design this package for any customers in the world.  The manufacture  is now available through OSAT companies such as ASE and SPIL, breaking a logjam of innovation created by previous proprietary technologies and essentially democratizing the 2.5D era. The technology transforms silicon micro-bump pitch (~45 microns) to C4 bump pitch (130-150 microns).

Sarcina Technology provides a choice of either silicon bridge (graphic 1a) or chip-last technology (graphic 1b) depending on I/O interconnect density between adjacent dice. FOCoS-B is suitable for higher density and FOCoS-CL (Chip Last) is suitable for lower density.  See figure below.

Bump Pitch Transformer Options

While all these advances are exciting, Larry pointed out the complexities design teams will face to reliably and accurately achieve the goal. Challenges include:

  • Shrinking bump pitch to increase interconnect density
  • Homogeneous & heterogeneous chiplet integration
  • Die-to-die interconnection, communication, & protocol standardization
  • Bump pitch transformer from microbump pitch to C4 bump pitch
  • Fab or assembly house availability
  • Design expertise and factory engineering collaboration
  • Cost effective EDA tools
  • Bump pitch transformer (interposer) yield and its assembly yield
  • Cost effective testing

Getting all these right requires substantial engineering talent and a highly accurate simulation and analysis tool flow. This is where Keysight’s PathWave Advanced Design System (ADS) and Memory Designer help quite a bit. Larry explained that the results of these tools have been tuned to be highly accurate. If the simulations look good, one can tape out with confidence.

The Bump Pitch Transformer (BPT) services offered by Sarcina include:

  • BPT (interposer) design, O/S test pattern insertion, fabrication & BPT wafer sort
  • PKG substrate design, PI/SI + thermal system simulation, & substrate fabrication
  • Package assembly, final test, and production services

Larry also reviewed the substantial experience and accomplishments the Sarcina team has in this area.  These include:

  • Successfully designed, fabricated, & tested a 2.5D Si interposer package
  • Si interposer design, O/S test pattern insertion, fab, & interposer wafer sort
  • Package substrate design, PI/SI, thermal simulation, & substrate fabrication
  • Assembly, final test, and production services
  • 47.5 mm x 47.5 mm HFCBGA with 2019 BGA balls
  • 1 ASIC + 2 HBMs on a silicon interposer
  • 12 substrate layers
  • 320 Watts
  • 32 lanes of 25 Gbps SerDes
  • 16 lanes of 16 Gbps PCIe-4 

To Learn More

A comprehensive case study is available from Keysight that discusses how Sarcina Delivers Right-First- Time Packages Using ADS For Chip-Package-Board Simulation. You can get a broad overview of Sarcina’s services here.  And you can learn more about advanced packaging here. And that’s how Sarcina revolutionizes advanced packaging.


Accelerating Analog Signoff with Parasitics

Accelerating Analog Signoff with Parasitics
by Bernard Murphy on 07-17-2024 at 6:00 am

Quantus Insight min

An under-appreciated but critical component in signing off the final stage of chip design for manufacture is timing closure – aligning accurate timing based on final physical implementation with the product specification. Between advanced manufacturing processes and growing design sizes, the most important factors determining timing – interconnect resistance and capacitance parasitics – have become more and more difficult to estimate accurately before final layout. Which is a problem since unexpected variances may require expensive rework at a time when the product schedule calls for speedy transition to manufacturing handoff.

Digital design flows have largely compensated for such variances through further automation and improved virtual modeling of likely interconnect topologies. Following pre-signoff analyses, designers can be reasonably sure that necessary post-layout ECOs, if any, should be relatively limited and easy to fix. Not so in analog (also mixed signal and RF). According to Hao Ji (VP R&D at Cadence, particularly responsible for parasitic extraction) and Akshat Shah (Sr. PE Group Director for the Virtuoso Platform at Cadence), analog design lacks the heavy automation common in digital flows and still depends heavily on handcrafted layouts guided by hand-estimated parasitics. As a result, while earlier generation analog post-layout sims might also have converged quickly, now they never work first time.

This challenge for analog closure is not a minor issue. Surveys indicate that drivers for silicon respins are now dominated by analog issues, not surprising since analog now plays a growing role in almost all large chip-designs – in PCIe, DDR/HBM and Ethernet interfaces, also in sensing. All domains on which modern systems advances critically depend.

Differences between estimated and real extracted parasitics can commonly take 6-8 weeks to diagnose and resolve, at a time when the pressure to tapeout is most intense. To squeeze this time, solutions must accommodate to the unique challenges of analog, helping to accelerate expert designer insight and suggested fixes before starting long-cycle layout changes and re-simulation. That’s what the Cadence QuantusTM  Insight Solution offers.

What makes analog post-layout signoff so difficult?

Part of the problem is simply the cycle time to make and re-verify changes: updating the schematic/layout, re-verifying through LVS, then re-verifying through SPICE(or other) simulation. This is a much slower cycle than hand-tweaking a digital netlist and re-running logic simulation, especially since even the fastest modern circuit simulators run 2-3 orders of magnitude slower than a (software) logic simulator.

A circuit designer suggests constraints to guide a layout designer, such as “make this connection no more than 10 ohms”. For the layout designer this advice is an approximate guideline since the complexity of routing through multiple layers and vias makes both resistance and capacitance hard to estimate. The layout designer will do their best, a very short route perhaps, but it won’t match the constraint precisely.

More generally, resistances in an advanced process have become critically important and are very hard to estimate in complex topology nets traversing through multiple layers of interconnect and vias. Field solvers are needed to determine accurate values so it shouldn’t be surprising that back-of-the-envelope calculations may be quite far off. Common fixes to address high resistance paths are known, for example arrayed vias or parallel routing paths. But first you need to know where such fixes might be needed.

Differential pairs present another challenge, in some ways easier to manage in layout because the layout designer knows that such connections must be exactly symmetric. But what isn’t easy to account for is capacitance contributions from neighboring wires, or from metal fills which don’t respect symmetry. Those effects aren’t going to be clear until post-layout extraction.

Diagnosing and determining optimal solutions to such problems depends on the judgement of expert designers. For now at least this judgement can’t be automated away, however automation can simplify debug and trial fixes. This is where the Quantus Insight Solution can help.

The path to diagnosis and what-if experiments

According to Hao, a DSPF extracted from a circuit of 100k instances can run to more than tens of millions of parasitic elements. The DSPF is an ASCII file, clearly at this scale completely unmanageable for manual analysis. Quantus Insight Solution bridges this gap by acting as a visual interactive DSPF debugger, coupled with Virtuoso schematic and layout views.

Suppose you know roughly where to look for a problem. You can zoom in on schematic or layout views to see overlaid resistance, capacitance or estimated (Elmore) delay values for point-to-point connections or layer-wise splits of a net. These are visualized as sorted tables (of R, C or delay), highest values first together with a heatmap representation of corresponding parasitic nets. From this view it’s often quite easy to see why the implemented layout deviates from the spec, for example in an R segment perhaps exceeding the constraint for the whole net, also to see which layer (or via) contributed significantly to that problem.

Sometimes the root cause for an error won’t be quite so obvious; say a differential pair fails to match but there are no standout culprits in either R or C. This is where Elmore delay estimates can be helpful, to highlight cumulative deviations in matching the pair.

If you don’t have a starting point, you can compare with the constraints set to guide layout, or you can compare 2 DSPFs, perhaps one working and one not working, to quickly isolate problem areas. For example if a capacitance is much higher than expected you can see exactly which layer contributed to that excess.

Then you can run a what-if analysis on changes you think may correct a problem. For example, you might change a resistance component and ask to recompute effective resistance to get a first order sense of whether that resolves the issue. If that fix looks good, Quantus Insight Solution can quickly generate a new DSPF on which you can re-run (SPICE) simulation to further verify suitability of the fix. In this way you can iterate relatively quickly through open issues. Only when satisfied do you need to go back to the layout designer to implement your suggested fixes.

You can learn more HERE.


Scientific Analog XMODEL #61DAC

Scientific Analog XMODEL #61DAC
by Daniel Payne on 07-16-2024 at 10:00 am

Scientific Analog 61dac min

Transistor-level circuit designers have long used SPICE for circuit simulation, mostly because it is silicon accurate and helps them to predict the function, timing, power, waveforms, slopes and delays in a cell before fabrication. RTL designers use digital simulators that have a huge capacity but are lacking analog modeling. So, how would you simulate an AMS chip design quickly and accurately?

At #61DAC I met with Jaeha Kim, CEO and Founder of Scientific Analog to learn how their approach allows an SoC design and verification team to model Analog circuits in SystemVerilog and UVM by using a plugin called XMODEL. The secret sauce with XMODEL is how it models analog functionality inside of an event-driven simulator.

Charles Dancak, Jaeha Kim, Rafael Betaourt

If you tried to digitize all the points in a continuous analog waveform it would make simulation too slow, instead with XMODEL there are equations that define the analog waveforms, and equations simulate quite fast and accurately. With XMODEL they are propagating analog outputs by using a Laplace transform on s-domains, where the simulator only has to compute something once per output equation.

I asked if XMODEL could be used to simulate something like a PLL, a typical AMS block that simulates very slowly in SPICE. Yes, with XMODEL a PLL can be simulated in SystemVerilog as dozens of pre-built primitives. Engineers would start with their SPICE or Spectre netlist and run the tool to automatically create a SystemVerilog netlist with XMODEL primitives. There are 220 XMODEL primitives to work with, and engineers may add their own XMODEL primitives, or ask Scientific Analog to add a new primitive. You won’t have to learn about writing something new and arcane like Real Number Modeling (RNM) with this XMODEL approach. Digital designers can run full chip AMS designs without being analog modeling experts by using XMODEL in their SystemVerilog and UVM simulations.

Engineers can also visually connect together XMODEL primitives with GLISTER by drawing schematics in Cadence Virtuoso, all without having to code anything. This approach allows you to check out the function of your analog idea before doing any detailed implementation work.

Chiplet design is growing in popularity and there’s an archived webinar on how to do UCIe PHY modeling and simulation with XMODEL at their site, and it goes into an overview of UCIe, an introduction to XMODEL, electrical layer modeling of transmit clock paths, then simulation results. Visitors can even download the slides and model package to get a better idea of how UCIe can be modeled using SystemVerilog.

More than 40 companies and universities are using XMODEL for their AMS design and verification, with names like: Samsung Electronics, Samsung Foundry, SK Hynix, Chinese Academy of Sciences, and Sung Kyun Kwan University. Scientific Analog is a member of the Si2 Open Access Coalition, and Accellera.

Verification engineers with big digital and little analog will greatly benefit from using XMODEL, and design engineers can quickly test out their new ideas before implementation begins. The XMODEL approach is much easier to learn and implement compared to using Verilog-A or Verilog-AMS. Analog designers don’t have to be afraid of learning SystemVerilog and UVM, because with XMODEL and MODELZEN they can quickly create something for their analog cells that can be used by their digital co-workers.

Another powerful use for XMODEL is with silicon photonics, as they process signals at the highest frequencies and using equations is always faster to simulate than digitized points. XMODEL has primitives designed for silicon photonics engineers. In fact, you can now simulate the combination of photonics, analog and digital circuits with XMODEL primitives today in SystemVerilog.

Scientific Analog was at #61DAC in June, DVCon US in February, DVCon Europe in October 2023, and they sponsored ASP-DAC in January 2023.

Summary

My visit with Scientific Analog was a fruitful one, as I learned how their unique technology called XMODEL allows analog modeling and simulation inside of SystemVerilog and UVM. The company has been around since 2015 and was founded in Palo Alto, CA, and now has distributors in Japan, China and North America. If you’re a design team doing AMS photonic designs, then I would give Scientific Analog a closer look for modeling, simulation and verification tasks.

Related Blogs