SemiWiki – Page 83 – The Open Forum for Semiconductor Professionals

May 1, 2024November 20, 2024

Nvidia Sells while Intel Tells

Nvidia Sells while Intel Tells
by Claus Aasholm on 05-01-2024 at 8:00 am
Categories: Semiconductor Business Intelligence, Semiconductor Services, TSMC

AMD’s Q1-2024 financial results are out, prompting us to delve into the Data Center Processing market. This analysis, usually reserved for us Semiconductor aficionados, has taken on a new dimension. The rise of AI products, now the gold standard for semiconductor companies, has sparked a revolution in the industry, making this analysis relevant to all.

Jenson Huang of Nvidia is called the “Taylor Swift of Semiconductors” and just appeared on CBS 60 Minutes. He found time for this between autographing Nvidia AI Systems and suppliers’ memory products.

Lisa Su of AMD, who has turned the company’s fate, is now one of only 26 self-made female billionaires in the US. Later, she was the CEO of the year in Chief Executive Magazine and has been on the cover of Forbes magazine. Lisa Su still needs to be famous in Formula 1

Hock Tan of Broadcom, desperately trying to avoid critical questions about the change of WMware licensing, would rather discuss the company’s strides in AI accelerator products for the Data Center, which has been significant.

An honorable mention goes to Pat Gelsinger of Intel, the former owner of the Data Center processing market. He has relentlessly been in the media and on stage, explaining the new Intel strategy and his faith in the new course. He has been brutally honest about Intel’s problems and the monumental challenges ahead. We deeply respect this refreshing approach but also deal with the facts. The facts do not look good for Intel.

AMD’s reporting

While the AMD result was challenging from a corporate perspective, the Data Center business, the topic of this article, did better than the other divisions.

The gaming division took a significant decline, leaving the Data Center business as the sole division likely to deliver robust growth in the future. As can be seen, the Data Center business delivered a solid operating profit. Still, it was insufficient to take a larger share of the overall profit in the Data Center Processing market. The 500-pound gorilla in the AI jungle is not challenged yet.

The Data Center Processing Market

Nvidia’s Q1 numbers have been known for a while (our method is to allocate all of the quarterly revenue in the quarter of the last fiscal month), together with Broadcom’s, the newest entry into the AI processing market. With Intel and AMD’s results, the Q1 overview of the market can be made:

Despite a lower growth rate in Q1-24, Nvidia kept gaining market share, keeping the other players away from the table. Nvidas’ Data Center Processing market share increased from 66.5% to 73.0% of revenue. In comparison, the share of Operating profit declined from 88.4% to 87.8% as Intel managed to get better operating profits from their declining revenue in Q1-24.

Intel has decided to stop hunting low-margin businesses while AMD and Broadcom maintain reasonable margins.

As good consultants, we are never surprised by any development in our area once presented with numbers. That will not stop us from diving deeper into the Data Center Processing supply chain. This is where all energy in the Semiconductor market is concentrated right now.

The Supply Chain view of the Data Center Processing

A CEO I used to work for used to remind me: “When we discuss facts, we are all equal, but when we start talking about opinions, mine is a hell of a lot bigger than yours.”

Our consultancy is built on a foundation of not just knowing what is happening but also being able to demonstrate it. We believe in fostering discussions around facts rather than imposing our views on customers. Once the facts are established, the strategic starting point becomes apparent, leading to more informed decisions.

“There is nothing more deceptive than an obvious fact.” Sherlock Homes

Our preferred tool of analysis is our Semiconductor Market model, seen below:

The model has several different categories that have proven helpful for our analysis and are described in more detail here:

We use a submodel to investigate the Data Center supply chain. This is also an effective way of presenting our data and insights (the “Rainbow” supply and demand indicators) and adding our interpretations as text. Our interpretations can undoubtedly be challenged, but we are okay with that.

Our current findings that the supply chain is struggling to get sufficient CoWoS packaging technology and High Bandwith Memory is not a controversial view and is shared by most that follow the Semiconductor Industry.

This will not stop us from taking a deeper dive to be able to demonstrate what is going on.

The Rainbow bars between the different elements in the supply chain represent the current status.

The interface between Materials & Foundry shows that the supply is high, while the demand from TSMC and other foundries is relatively low.

Materials situation

This supply/demand situation should create a higher inventory position until the two bars align again in a new equilibrium. The materials inventory index does show elevated inventory, and the materials markets are likely some distance away from recovery.

Semiconductor Tools

The recent results of the semiconductor tools companies show that revenues are going down, and the appetite of IDMs and foundries indicates that the investment alike is saturated. The combined result can be seen below, along with essential semiconductor events:

The tools market has flatlined since the Chips Act was signed, and there can certainly be a causal effect (something we will investigate in a future post). Even though many new factories are under construction, these activities have not yet affected the tools market.

A similar view of the subcategory of logic tools which TSMC uses shows an even more depressed revenue situation. The tools revenue is back to a level of late 2021, in a time with unprecedented expansion of the semiconductor manufacturing foot print:

This situation is confirmed on the demand side as seen in the TSMC Capital Investments chart below.

Right after the Chips Act was signed, TSMC lowered the capex spend to close to half, making life difficult for the tools manufacturers.

The tools foundry interface has high supply and low demand as could be seen in the supply chain model. The tools vendors are not the limiting factor of GPU AI systems.

The Foundry/Fabless interface

To investigate the supply demand situation between TSMC and it’s main customers we choose to select AMD and Nvidia as they have the simplest relationship with TSMC as the bulk of their business is processors made by TSMC.

The inventory situation of the 3 companies can be seen below.

As TSMC’s inventory is building up slightly does not indicate a supply problem however this is TSMC total so their could be other moving parts. The Nvidia peak aligns with the introduction of the H100.

TSMC’s HPC revenue aligns with the Cost of Goods sold of AMD and Nvidia.

As should be expected, these is no surpises in this view. As TSMC’s HPC revenue is growing faster than the COGS of Nvidia and AMD, we can infer that a larger part of revenue is with other customers than Nvidia and AMD. This is a good indication that TSMC is not supply limited from a HPC silicon perspective. Still, the demand is still outstripping supply at the gate of the data centers.

The Memory, IDM interface

That the skyhigh demand for AI systems is supply is limited, can be seen by the wild operating profit Nvidia is enjoying right no. The supply chain of AI processors looks smooth as we saw before. This is confirmed by the TSMC’s passivity in buying new tools. If there was a production bottle neck, TSMC would have taken action from a tools perspective.

An anlysis of Memory production tools hints at the current supply problem.

The memory companies put the brakes on investments right after the last downcycle began. The last two quarters the demand has increased in anticipation of the High Bandwidth Memory needed for AI.

Hynix in their rececent investor call, confirmed that they had been underinvesting and will have to limit standard DRAM manufacturing in order to supply HBM. This is very visible in our Hynix analysis below.

Apart from the limited supply of HBM, there is also a limitation of advanced packaging capacity for AI systems. As this market is still embryonic and developing, we have not yet developed a good data method to be able to analyze it but are working on it.

While our methods does not prove everything, we can bring a lot of color to your strategy discussions should you decide to engage with our data, insights and models.

May 1, 2024May 14, 2024

Anirudh Keynote at CadenceLIVE 2024. Big Advances, Big Goals

Anirudh Keynote at CadenceLIVE 2024. Big Advances, Big Goals
by Bernard Murphy on 05-01-2024 at 6:00 am
Categories: AI, Automotive, Cadence, Chiplet, EDA

The great things about CEO keynotes, at least from larger companies, is that you not only learn about recent advances but you also get a sense of the underlying algorithm for growth. Particularly reinforced when followed by discussions with high profile partner CEOs on their directions and areas of common interest. I saw this recently in Pat Gelsinger’s Intel Vision pitch and I saw it recently in Cadence’s Anirudh Devgan’s keynote and follow on discussions with Jensen Huang of NVIDIA and Cristiano Amon of Qualcomm which I will cover in a separate blog.

The big takeaways

Lots of good information on anticipated growth in key markets, especially an expectation that the semiconductor market will grow to a trillion dollars in the next few years. Anirudh thinks this may be an underestimate given fast growth in AI, datacenter, and automotive. The latter answers my earlier question about why everyone was asking for blogs on automotive. AI of course is driving a lot of growth but so are digital twins. We’re used to digital twin design in semiconductors (in the form of EDA/System Design & Analysis (SDA), but the concept has also been catching on in other markets like aerospace and automotive. Not just for electronics subsystems, now also in physical modeling which has traditionally relied more (as much as 80% of the design cycle) on much slower and more expensive prototype-based experimentation. Drug design is even further behind, offering (so far) limited advantages through virtualization. Yet aircraft, car and therapeutics designers want the same digital twin accelerators already enjoyed in electronics design.

Why the huge difference? There are obvious differences in disciplines: in fluid dynamics around wings or through jet engines, in mechanical CAD and multiphysics, in molecular dynamics. Yet these are well understood domains with decades of principled science and engineering/experimental practices to back them up. Virtual models for these systems have been possible but were commonly too slow and inaccurate to be useful.

In contrast, electronic design has aimed for a long time to be completely virtual. Anirudh attributes this success to a 3-layer cake: in the middle principled simulation and optimization to guarantee obligatory accuracy, sitting on top of accelerated compute to deliver many orders of magnitude improvement in throughput over even massive CPU parallelism, with AI on top to intelligently guide these analytics through massive design state spaces. In line with the Cadence computational software mantra, he sees potential to extend the same benefits to those other domains, as they are already seeing in modeling turbulent flows in CFD for aero and auto design.

Application to EDA/SDA

This philosophy also drives advances in EDA/SDA. Cadence Cerebrus is now demonstrating customer implementation results not only faster than manual trials but with better PPA. MediaTek is one cited example. Customers are seeing almost half the benefit of a node migration with better AI tools and optimization, a huge benefit. Analog design automation, long a holy grail, is also starting to make real progress. In verification, Cadence announced their Palladium Z3 and Protium X3 acceleration platforms, offering higher performance and lower power per gate for up to 48 billion gates. NVIDIA has been a big user of Palladium acceleration and a Palladium development partner for 20 years. Meanwhile hardware verification is complemented by the AI-driven Verisium platform to optimize across verification runs. In PCB and 3D-IC design Allegro remains the leading platform, especially in aerospace and defense and is now able to offer 10X productivity improvement through AI enhancement.

There is more in multiphysics optimization: Celsius for thermal simulation, Clarity for electromagnetics, Sigrity for signal integrity are now integrated in one platform for improved accuracy through tighter coupling. Together with Optimality for workflow optimization, the multiphysics platform claims significant productivity improvements from 3D-IC all the way up to rack level analytics and optimization. Finally, and also notable is that Cadence is increasing its investment in IP with a new leader and acquisitions of interface IP and cores particularly around 3D-IC/chiplet requirements.

Beyond EDA/SDA

Cadence is already aiming at playing a bigger role in total system design. Not just big electronics but expanding its role in automotive, aircraft, datacenter, power generation and transmission, football stadiums, and of course drug design. Not in one big step naturally but headed in those directions. Cadence are already partnered with McLaren and Honda for aerodynamic design, with the 49ers for sustainability optimization, with datacenters for power and cooling optimization, with Pfizer for advanced molecular design, and even in the International Maritime Organization’s efforts to make the shipping industry greener. Big goals but also huge opportunities 😀

I’ll have more to say on the molecular sciences topic in a separate blog. For now, I’ll just offer a brief prelude. This domain may seem far removed from the world of semiconductor design with some significant differences but in the initial stages of exploration – exploring options in a huge state space – research technologies have a lot in common. Which is good because drug design desperately needs help. In contrast to semiconductor design, the cost of developing a new drug is doubling every 9 years, which some have labeled Eroom’s Law (Moore’s law backwards😀). A 3-layer cake approach could help manage this cost with huge implications for health care. Which make Cadence’s acquisition of Open Eye look like a pretty smart move.

As I said, big advances and big goals. It’s encouraging to see a nominally EDA/SDA company expanding beyond those bounds.

April 30, 2024October 15, 2024

Will my High-Speed Serial Link Work?

Will my High-Speed Serial Link Work?
by Daniel Payne on 04-30-2024 at 10:00 am
Categories: EDA, Siemens EDA
1 Comment

PCB designers can perform pre-route simulations, follow layout and routing rules, hope for the best from their prototype fab, and yet design errors cause respins which delays the project schedule. Just because post-route analysis is time consuming doesn’t mean that it should be avoided. Serial links are found in many PCB designs, and doing post-route verification will ensure proper operation with no surprises, yet there is reluctance to commit signal integrity experts to verify all the links. I read a recent white paper from Siemens that offers some relief.

Here are four typical methods for PCB design teams to analyze designs after layout.

Send PCB to fab while following guidelines and expect it to work.
Perform visual inspection of the layout to find errors.
Ask a signal integrity expert to analyze the PCB.
Have a signal integrity consultant analyze the PCB.

These methods are error prone, time consuming and therefore risky. There must be a better way to validate every serial link to ensure protocol compliance prior to fabrication in a timely manner by using some clever automation.

Post-route Verification of serial links

Validating serial links is a process of electromagnetic modeling, analysis, and results processing. High signal frequencies used with serial channels require a full-wave electromagnetic solver to model the intricacies where the signals change layers, going from device pin to device pin. Analysis looks at the channel model including the transmitter (Tx) and receiver (Rx) devices, and the channel protocol to understand what the signal looks like at the link end. Results processing helps to measure if our design passes and the specific margins.

Channel Modeling

With the cut-and-stitch approach the channel is cut into regions of transverse electromagnetic mode (TEM) and non-TEM propagation, solving each region independently, and stitch each region together to create the full channel. Cut-and-stitch is less accurate than modeling the full channel at once, yet it’s a faster approach worth taking. Knowing where to make each cut is critical for accuracy and each cut region needs to include the discontinuity like a via, any traces nearby and the signal’s return path. An experienced SI engineer knows where to make these cuts.

The clever automation comes in the form of HyperLynx from Siemens, as it knows where to cut and then automatically create signal ports and setting up the solver for you. HyperLynx users can setup hundreds of areas per hour for full-wave simulations. To speed up run times the simulations can be run in parallel across many computers. On stitching, HyperLynx automates by adding lossy transmission lines with solved models. The length of transmission lines are adjusted, because parts of the signal trace are inside 3D areas. HyperLynx also automates each transmission line adjustments. You now can have interconnect models for hundreds of signal channels, by using automation and get the simulation results overnight.

Analysis

IBIS-AMI simulation is the most accurate way to analyze serial links after layout, as the Tx/Rx models come from the vendors, however you may have to wait to get a model and the runtimes can be long. Another way to analyze a serial channel is with standards-based compliance, which is based on channel requirements in the protocol specification and compliance analysis runs quickly – in under a minute. The downside of compliance analysis is that there are dozens of protocols with hundreds of documentation pages and having at least five different analysis methods.

With HyperLynx there’s a SerDes Compliance Wizard to help support all the different methods for determining compliance. Users just specify the channels to analyze, select the protocol, and then run. There are 210 protocols supported in HyperLynx, and parameters can be adjusted for each protocol.

Results Processing

An IBIS-AMI simulator uses clock tick information, centering the equalized signal in the bit period, producing an eye diagram, while assuming the clock sampling is in the middle. An eye mask compares to the eye diagram, so if the inner portion of the eye doesn’t cross into the mask, then the test has passed. A statistical simulation is run to determine if the target bit error rate is reached, like 1e-12 or lower. If only a few million time-domain simulations are run, then extrapolation must be used. Modeling jitter is another challenge, and users may have to find and add jitter budgets. Meaningful AMI analysis results require a full-time SI engineer that knows the IBIS-AMI spec and simulator well.

Compliance analysis is more reliable than IBIS-AMI simulation as you can run it despite having vendor models, and it’s quicker and easier to do. HyperLynx reports show which signals passed or failed, plus the margins.

Automated Compliance Analysis

The traditional flow for post-route verification of serial channels is shown below, where red arrows indicate where data is examined, and the process is repeated for any adjustments.

Traditional compliance analysis flow

The HyperLynx flow is much simpler than the traditional compliance analysis flow, as automation helps speed up the process, so that all channels in a system design can be modeled and analyzed.

Using HyperLynx for post-route serial channel protocol verification

Summary

High-speed serial links require careful channel modeling, analysis and results processing to ensure reliable operation and meeting the specifications. A traditional approach has been compared to the HyperLynx approach, where the benefits of HyperLynx were noted:

Analyze all channels in a design for compliance
Overnight results
Reports which channels pass or fail, and by how much margin

Read the entire 13 page white paper from Siemens online.

Related Blogs

April 30, 2024October 15, 2024

Enabling Imagination: Siemens’ Integrated Approach to System Design

Enabling Imagination: Siemens’ Integrated Approach to System Design
by Kalar Rajendiran on 04-30-2024 at 6:00 am
Categories: AI, Analytics, Chiplet, EDA, Emulation, Prototyping, Siemens EDA

In today’s rapidly advancing technological landscape, semiconductors are at the heart of innovation across diverse industries such as automotive, healthcare, telecommunications, and consumer electronics. As a leader in technology and engineering, Siemens plays a pivotal role in empowering the next generation of designs with its integrated approach to system design. This fact may sometimes get drowned in a torrent of others news, particularly when people don’t hear the decades old familiar name “Mentor Graphics” in the news anymore. Siemens retired that name in 2021 and replaced it with Siemens EDA, a segment of Siemens Digital Industries Software. Siemens EDA’s financials are not separately disclosed publicly as when Mentor Graphics was a separate company. Naturally, there are lots of questions in people’s minds about Siemens EDA’s role within the broader ecosystem, how it is performing and where it is headed.

At the recent User2user conference, Mike Ellow, Siemens EDA’s Executive Vice President gave an excellent keynote talk that addressed all these questions and more. His talk provided insights into how Siemens EDA is doing, its vision, its key investment areas and why Siemens EDA is an investment priority at Siemens. The following is a synthesis with some excerpts from Mike’s keynote presentation.

How is Siemens EDA Doing?

Siemens EDA demonstrated its EDA leadership through its 14% year-on-year growth in their recently closed fiscal year. This is noteworthy given Siemens EDA’s revenue does not include any significant IP revenue stream. The division also experienced a double-digit percentage increase in R&D headcount, which is the highest investment in Siemens EDA’s history (excluding acquisitions).

The following charts provide more financial details and speak for themselves.

Why is Siemens Investing in Siemens EDA?

We in the semiconductor and electronics industries have always known that semiconductors are at the center of a changing world. The only difference now is that everyone else has recognized it too.

And the semiconductor industry is projected to grow at an incredibly accelerated pace, crossing $1 trillion by 2030 [Sources: International Business Strategies/Nov 2022 and VLSI Research/Dec 2022].

Siemens EDA Enabling A New Era for System Design

Siemens EDA’s comprehensive digital twin technology plays a critical role in the design, verification, and manufacturing of complex electronic systems. A digital twin is a virtual representation of a physical system or product, and in the context of electronic design automation (EDA), it encompasses various aspects of electronic system development. Siemens EDA focuses on three key investment areas that enhance the capabilities of Siemens EDA’s digital twin technology, providing an integrated, holistic approach to design, verification, and manufacturing.

Accelerated System Design:

Leveraging advanced tools and methodologies to speed up the design process, accelerated system design includes high-level synthesis, system-level design and verification, and virtual prototyping. These tools enable engineers to quickly model and simulate complex electronic systems, leading to faster time-to-market and improved quality.

Advanced Heterogeneous Integration:

Combining different types of components and technologies on a single package or substrate, advanced heterogeneous integration facilitates the development of highly integrated and compact systems. Siemens EDA’s solutions include 3D ICs, multi-die packaging, and advanced packaging and assembly.

Manufacturing-Aware Advanced Node Design:

This area involves creating electronic designs that take into account the intricacies of advanced manufacturing processes. Design for manufacturability (DFM), process technology co-design, and support for advanced node technologies enable engineers to create optimized designs that can be reliably manufactured.

Revolutionizing Electronic System Design

Some key solutions that Mike touched upon during his keynote talk include:

Veloce CS Accelerates All Areas of System Design

Recently announced Siemens EDA’s Veloce CS platform offers high-speed emulation and prototyping capabilities, accelerating the verification of complex electronic systems. Veloce CS streamlines design, verification, and testing processes, enhancing overall product development efficiency. At 40B gates capacity, the solution boasts the highest capacity solution in the industry. Key features include:

Early Software Development: Veloce CS provides a hardware platform for early software development, allowing software teams to test and debug their code on virtual hardware.

Full-System Simulation: Engineers can simulate entire systems, including hardware, software, and peripherals, to ensure all aspects of the design work together seamlessly.

Comprehensive Debugging: Advanced debugging features such as waveform viewing, performance profiling, and hardware-assisted tracing help engineers identify and resolve issues quickly.

3DIC Tooling

Siemens EDA’s 3D integrated circuit (3DIC) tooling spans its entire portfolio, providing comprehensive support for the design, verification, and manufacturing of 3DICs. This includes:

Design Tools: Siemens EDA offers tools for floorplanning, partitioning, and routing 3DIC designs to optimize performance and space usage.

Verification and Simulation: Advanced tools for simulating power, thermal, and signal integrity aspects of 3DICs ensure reliable performance.

Physical Implementation: 3DIC layout and design for manufacturability (DFM) tools help create detailed designs that can be manufactured efficiently.

3DIC Modeling and Visualization: Engineers can use advanced modeling and visualization tools to better understand spatial relationships and optimize designs.

Solido Statistical Analysis and Optimization

Solido is a technology suite focusing on the design, verification, and optimization of integrated circuits (ICs) using advanced statistical analysis and machine learning techniques, especially in the context of process variability. Solido’s tools allow engineers to handle the complexities of modern IC design, creating reliable, high-quality designs.

Tessent Embedded Analytics

Siemens EDA Tessent offers a suite of tools for design-for-test (DFT), design-for-diagnosis (DFD), and design-for-reliability (DFR) in semiconductor devices. These solutions improve testability, diagnosis, and reliability in electronic designs, contributing to the creation of high-quality, functional semiconductor devices.

Artificial Intelligence (AI) not new to Siemens EDA

Siemens EDA has been leveraging AI for many years well before AI became a buzz word in the industry, through its products such as Solido and Tessent. Now of course, AI techniques are being leveraged by products across its entire EDA portfolio.

Summary

Siemens EDA’s integrated approach to system design, combined with its comprehensive EDA solutions, positions the company as a leader in enabling imagination and driving innovation in the semiconductor industry. Through early software validation, manufacturing-aware design, AI-enhanced design automation tooling, open ecosystem enablement, and advanced EDA tools, Siemens EDA is empowering engineers and designers to create the next generation of high-quality, leading-edge systems. As technology continues to evolve, Siemens EDA’s solutions will play a crucial role in shaping the future of electronics and ensuring continued success for its customers and the wider industry.

Also Read:

Design Stage Verification Gives a Boost for IP Designers

Checking and Fixing Antenna Effects in IC Layouts

Siemens Promotes Digital Threads for Electronic Systems Design

April 29, 2024April 29, 2024

Ceva Accelerates IoT and Smart Edge AI with a New Wireless Platform IP Family

Ceva Accelerates IoT and Smart Edge AI with a New Wireless Platform IP Family
by Mike Gianfagna on 04-29-2024 at 10:00 am
Categories: AI, Ceva, IP

Ceva is a very focused company. In its words, the leader in innovative silicon and software IP solutions that enable smart edge products to connect, sense, and infer data more reliably and efficiently. You can see some of its accomplishments here. The company has been licensing IP for more than twenty years with more than 17 billion Ceva-powered devices shipped, including more than 1.6 billion devices in 2023. Impressive. Thanks to the growing popularity of intelligent products and the massive data created by ubiquitous sensor networks to drive those products, edge computing has become a key element to deliver new innovation. Recently, Ceva made an announcement at Embedded World in Nuremberg, Germany. The announcement has significant implications for IoT and Smart Edge AI Applications. In this post, I’ll summarize the announcement and take a closer look at the underlying technology to see how Ceva accelerates IoT and smart edge AI with a new wireless platform IP family.

The Announcement

At the center of the announcement was the Ceva-Waves™ Links™ IP Family. This new product family delivers fully integrated multi-protocol connectivity solutions with Wi-Fi, Bluetooth, UWB, Thread, Zigbee, and Matter. By covering all those protocols in one architecture, development is simplified and time to market is accelerated for next generation, connectivity rich, MCUs and SoCs. Momentum for the product line begins with the introduction of Ceva-Waves Links100, an IoT-focused connectivity platform IP with RF implemented on TSMC 22nm. This platform is currently being deployed by a leading OEM customer.

The new IP family finds application in the consumer IoT, industrial, automotive, and personal computing markets. A key feature of the family is the wide protocol support – Wi-Fi, Bluetooth, Ultra-Wideband (UWB), and IEEE 802.15.4 (for Thread / Zigbee / Matter). This delivers a range of qualified, easy-to-integrate, multi-protocol wireless communications subsystems, featuring optimized co-existence schemes and adapted to various radios and configurations. The demand for smaller, low-cost, high-performance devices is driving the need to consolidate multiple connectivity protocols in a single chip. ABI Research has discussed the move from module-level integration to on-die chip integration and forecasts that Wi-Fi plus Bluetooth combo chipset shipments will approach 1.6 billion chips annually by 2028.

Tal Shalev

In the release, Tal Shalev, Vice President and General Manager of the Wireless IoT BU at CEVA commented:

“The Ceva-Waves Links wireless connectivity IPs build on our extensive portfolio that already powers more than 1 billion devices annually and has enabled us to establish a strong and diversified customer base across consumer and industrial IoT applications. With many customers designing chips employing multiple wireless standards, Links is a natural extension, leveraging our technology and expertise to dramatically reduce the technology barrier but yet delivering a tailored, optimal solution that provides the high-performance, low latency and low-power connectivity required.”

The first member of the Ceva-Waves Links family is the Links100, an integrated, low power, Wi-Fi / Bluetooth / 802.15.4 communications subsystem IP for IoT applications. You can read the complete press release here.

A Closer Look at the Ceva-Waves Links IP Family

First, a bit of history regarding product family designations. The Ceva Connectivity IP Family is now unified under Ceva-Waves™ solutions. This includes products such as Ceva-Waves™ Bluetooth (supporting also IEEE 802.15.4), Ceva-Waves WiFi, and Ceva-Waves UWB. The announcement introduces Ceva’s multi-protocol wireless combo platform family, Ceva-Waves Links.

Regarding protocols, each standard has its own strengths. A quick profile is useful:

Bluetooth is the most widespread low power wireless connectivity used to transfer small amounts of data for a broad range of applications such as mobile, wearable, hearable, smart home, connected home, medical, automotive, and IoT.
Wi-Fi is the most widespread wireless technology to connect devices to the internet used to transfer small to big amounts of data for a broad range of applications such as mobile, wearable, smart home, connected home, medical, automotive, and IoT.
IEEE 802.15.4 is a popular low power wireless technology to connect devices to transfer small amounts of data in applications such as smart home and IoT. It is the underlying technology used for Thread, Matter and Zigbee.
UWB introduces a new realm of spatial awareness with the most accurate and secure ranging, Android Open Accessory (AoA) support, and radar sensing capabilities. It is used in a wide range of applications such as automotive, wearables, asset tracking, find-me, indoor navigation, and payments.

Combining multiple protocols in one IP family has significant benefits. These include:

Lower-cost – incorporating multiple wireless standards in a chip reduces the bill of material with less components, lower size, and smaller PCB
Lower-power – incorporating multiple wireless standards and RF in a single chip reduces the power consumption thanks to resource sharing and co-existence optimization
Fast, simple and risk free – replacing a few components by a single integrated chip accelerates time to market, simplifies the design, and reduces the risks of mistake and bad performance
Higher co-existence performance – instead of having separate wireless chips, embedding all in a single component enables richer co-existence interfaces leading to higher performance multi-protocol scenarios
Versatility for future-proofing – supporting multiple wireless standards ensures longevity and compatibility with evolving connectivity requirements for a wide range of use cases

Digging a bit deeper, Ceva-Waves Links delivers a family of integrated, multi-protocol wireless communication platforms. As discussed, the family is built on the core connectivity technologies of Bluetooth, 802.15.4, Wi-Fi, and UWB. This provides a seamless end-to-end solution, from radio to upper software stacks.

Links Family Advanced Wireless Platforms

The product family features optimized co-existence schemes for seamless protocol integration and is adaptable to a range of radios, either from partners or provided by Ceva. The family is designed with a modular architecture, enabling unique customization, targeting a variety of use cases and markets for unparalleled versatility. The diagram to the right summarizes the family architecture.

Ceva-Waves Links100 is the first available member of the new family. It is a fully integrated wireless platform IP designed for low-power applications, combining hardware and software for Wi-Fi 6 1×1, Bluetooth 5.4 dual-mode, and 802.15.4 (for ZigBee, Thread and Matter). It contains a 2.4GHz RF transceiver in 22nm technology, shared between Bluetooth, 802.15.4 and Wi-Fi operations.

Links 100 Architecture

This delivers a smart co-existence scheme for multi-protocol traffic, with a complete software suite for easy deployment. As mentioned, the product is currently being deployed by a leading OEM customer. The diagram to the right summarizes the Links100 architecture.

To Learn More

You can learn more about the Ceva-Waves Links IP Platforms here. More information is also available regarding Ceva’s support for Wi-Fi, Bluetooth, and Ultra-Wideband (UWB). And that’s how Ceva accelerates IoT and smart edge AI with a new wireless platform IP family.

April 29, 2024April 29, 2024

LRCX- Mediocre, flattish, long, U shaped bottom- No recovery in sight yet-2025?

LRCX- Mediocre, flattish, long, U shaped bottom- No recovery in sight yet-2025?
by Robert Maire on 04-29-2024 at 8:00 am
Categories: Semiconductor Advisors, Semiconductor Services
1 Comment

– Lam reports another flat quarter & guide- No recovery in sight yet
– Seems like no expectation of recovery until 2025- Mixed results
– DRAM good- NAND weak- Foundry/Logic mixed-Mature weakening
– Clearly disappointing to investors & stock hoping for a chip recovery

Another Flat Quarter & Guide

Lam’s report was uninspiring in that last quarter was $3.76B, the current quarter was $3.8B and guide is for $3.8B….flat, flat and more flat…the long flat bottom of a “U” shaped down cycle.

Revenue came in at $3.79B with EPS of $7.79 guidance is for $3.8B+-$300M and EPS of $7.50+- $0.75.

We are clearly bouncing along a more or less flat bottom of a normal semiconductor down cycle.

DRAM is better with HBM being the obvious bright spot. We would remind investors, again and agin , that HBM is only about 5% of the market, so even if its on fire its still not a lot.

Foundry/Logic is clearly mixed with more mature nodes slowing considerably.

NAND remains weak with hopes for a 2025 recovery as excess capacity gets used up.

China bumps up to 42% from 40%

China remains the biggest spender in the industry at 42% of Lam’s business versus only 6% for the US. China is outspending the US in Semiconductor equipment by a ratio of 7 to 1 (at least at Lam).

This remains a significant exposure for Lam and others if the US ever gets around to being serious about slowing China’s rapid progress in the semiconductor industry. At this point its likely way to late as China will have 5NM in the not too distant future thanks to US equipment companies enablement and the commerce departments lack of action.

Management is acting like no recovery in sight until maybe 2025

Management quoted on the call saying that things were setting up for a better 2025 which we think is code for “don’t expect a recovery anytime in 2024”.

Headcount was flat at 17,200 and if management felt a recovery was on the way we would expect an uptick rather than flat. Finally, management made comments about managing spend.

Overall, we did not get a positive tone from management on the call both in their comments as well as questions answered during Q&A. Overall a mediocre call at best and uninspiring.

ASML & Litho always precedes the rest of the industry

We would point out that litho orders always happen early in an up cycle given the long lead times. Deposition and etch tend to be more of a turns business with shorter lead times versus litho tools which can have lead times well over a year especially in EUV.

Thus its going to be impossible to see a recovery in Lam and AMAT until we get the prerequisite bounce in litho tool orders

Not a lot of new markets – Dry resist likely seeing “resistance”

Lam has not had huge success in breaking out of its traditional deposition and etch markets that it has been in since the Lam – Novellus merger in 2012.

Lam is trying to branch out into litho related markets by entering the dry resist market.

From what we have heard, this has been relatively slow going in large part due to cost/throughput issues. Rumors we have heard from customers point to a very high tool cost coupled with an ultra slow throughput winding up with a cost per wafer processed that rivals EUV costs or more.

In our view this will severely limit the ultimate market size as current spin on resist is dirt cheap by comparison and fine for the vast majority of applications.

Lam likely needs to find some other new markets either organically or through acquisition to get growth.

The Stocks

We have been saying for a while here that the semiconductor equipment stocks had gotten way ahead of themselves and the recent pullback seems to underscore our belief.

It should come as almost no surprise that Lam shares traded down in the after market as its clear that investors had been hoping for and pricing in a recovery that clearly isn’t coming any time soon.

Meanwhile we still have significant exposure to all the China sales.

Strength in HBM is a small percentage and doesn’t offset the broader weakness in NAND. The weakness in mature foundry/logic means that a key driver has run its course.

AI is obviously fantastic, but it too is a very small percentage of the overall chip market and only at the bleeding edge. TSMC has the lock on the leading edge AI market and we don’t see them running out and throwing money at equipment companies as we see quite the opposite in their more negative outlook.

Investors need to dissuade themselves of the inference that AI and HBM will be a boon to the chip industry. It will be great but the majority of the industry remains weak and in a funk that will take a while to recover from as this has been one of the deeper down cycles in our long experience.

We clearly expect some weakness out of Lam shares and don’t expect glowing reports from KLAC or AMAT and would expect weakness in their share price in sympathy.

Overall, we remain with our view that this is a long slow recovery hampered by macro issues as well as industry sp[ecific issues such as oversupply.

The CHIPS Act is also not coming to the rescue any time soon as you need to build the fabs before you buy the equipment so the trickle down from the CHIPS Act to the equipment makers is well over a year or two away.

Remain hunkered down……

About Semiconductor Advisors LLC

Semiconductor Advisors is an RIA (a Registered Investment Advisor),
specializing in technology companies with particular emphasis on semiconductor and semiconductor equipment companies. We have been covering the space longer and been involved with more transactions than any other financial professional in the space. We provide research, consulting and advisory services on strategic and financial matters to both industry participants as well as investors. We offer expert, intelligent, balanced research and advice. Our opinions are very direct and honest and offer an unbiased view as compared to other sources.

Also Read:

ASML- Soft revenues & Orders – But…China 49% – Memory Improving

ASML moving to U.S.- Nvidia to change name to AISi & acquire PSI Quantum

SPIE Let there be Light! High NA Kickoff! Samsung Slows? “Rapid” Decline?

April 29, 2024April 29, 2024

WEBINAR: The Rise of the DPU

WEBINAR: The Rise of the DPU
by Don Dingee on 04-29-2024 at 6:00 am
Categories: Achronix, eFPGA, Events

The server and enterprise network boundary has seen complexity explode in recent years. What used to be a simple TCP/IP offload task for network interface cards (NICs) is transforming into full-blown network acceleration using a data processing unit (DPU), able to make decisions based on traffic routes, message content, and network context. Parallel data path acceleration on hundreds of millions of packets at speeds reaching 400 Gbps is where Achronix is putting its high-performance FPGAs to work. Recently, Achronix hosted a LinkedIn Live event on “The Rise of the DPU,” bringing together four experienced server and networking industry veterans to discuss DPU trends and field audience questions on architectural concepts.

DPUs add efficient processing while retaining programmability

The event begins by recognizing that industry emphasis is shifting from smartNICs to DPUs. Ron Renwick, Director of Product Marketing at Achronix and host for the event describes the evolution leading to DPUs, where wire speeds increase, offload functionality grows, and ultimately, localized processor cores arrive in the high-speed data path. “Today’s model is the NIC pipeline and processors all embedded into a single FPGA,” he says, with a tightly coupled architecture programmable in a standard environment.

Renwick also notes that creating a dedicated SoC with similar benefits is possible. However, the cost to develop one chip – and its ability to withstand data path and processing requirement changes that inevitably appear as network features and threats evolve rapidly – make an Achronix FPGA on a DPU a better choice for most situations.

Baron Fung of the Dell’Oro Group agrees, noting that the hyperscale data centers are already moving decisively toward DPUs. His estimates pin market growth at a healthy 25% CAGR, headed for a $6B total in the next five years. Fung shares that hyperscalers using smartNICs still chew up as much as half their CPU cores on network overhead services like security, storage, and software-defined features. Moving to a DPU frees up most, if not all, of the server processing cores, so cloud and data center customers get the processing they’ve paid for.

Patrick Kennedy of the review site Serve the Home echoes this point, saying that smartNICs need a management CPU complex, while DPUs have processing, memory, storage, and possibly an operating system on board. Kennedy reminds everyone that introducing an OS on a DPU creates another point in a system for security management.

AI reshaping networks with DPUs in real-time

The wildcard in DPU adoption rates may be the fourth bubble in the image above – accelerated computing with AI. Scott Schweitzer, Director of DPU Product Planning at Achronix, says that in any networking application, reducing latency and increasing determinism go hand in hand with increased bandwidth. “Our high-performance 2D network-on-chip operating at 2 GHz allows us to define blocks dynamically on the chip to set up high-speed interconnect between various adapters in a chassis or rack,” he continues. Machine learning cores in an FPGA on the DPU can process those network configuration decisions locally.

Fung emphasizes that AI will help offload the control plane by function. “AI-based DPUs improve resource utilization of accelerated servers and in scalable clusters,” he adds. Using DPUs to connect and share resources may have a strong use case in large GPU-based AI training clusters, helping open the architecture around Ethernet.

Kennedy likes the idea of AI clusters, recognizing that training is a different problem than inference. “Once you have models trained, you now have to be able to serve a lot of users,” he observes. DPUs with Ethernet networks make sense as the user-facing offload that can help secure endpoints, ingest data, and configure the network for optimum performance.

Those are some highlights from the first half of the event. In the second half, the open discussion among the panelists uses audience questions to generate starting points for topics touching on future DPU features and use cases, hyperscaler and telecom adoption, coordinating DPUs with other network appliances, and more. Much of the value of these Achronix events is in these discussions, with unscripted observations from Achronix experts and their guests.

For the entire conversation, watch the recorded webinar:
LinkedIn Live: The Rise of the DPU

Also Read:

WEBINAR: FPGA-Accelerated AI Speech Recognition

Unveiling the Future of Conversational AI: Why You Must Attend This LinkedIn Live Webinar

Scaling LLMs with FPGA acceleration for generative AI

Podcast EP220: The Impact IQE’s Compound Semiconductors Are Having on the Industry with Dr. Rodney Pelzel

Podcast EP220: The Impact IQE’s Compound Semiconductors Are Having on the Industry with Dr. Rodney Pelzel
by Daniel Nenni on 04-26-2024 at 10:00 am

Dan is joined by Dr. Rodney Pelzel, he has over 20 years of experience in the semiconductor industry, with deep expertise in semiconductor materials engineering and the epitaxial growth of compound semiconductors. Dr. Pelzel joined IQE as a Production Engineer in 2000 and is now head of R&D and is tasked with creating unique materials solutions that enable IQE’s customers and provide them with a competitive edge. He is a Chartered Engineer and a Chartered Scientist, and a Fellow of the Institution of Chemical Engineers. Dr. Pelzel’s work has been widely published and he is the co-inventor of 30+ patents.

Rodney explains the significant impact compound semiconductors have on current and future products and the 30-year history that IQE has in this space as a global supplier. The application of Gallium Nitride (GaN) is explored in detail. Rodney explains the power and performance gains delivered by this technology and points to several large markets that can benefit from its capabilities.

He explores the growth of AI, both now and into the future and discusses how GaN can address a fundamental problem of power consumption for these new technologies.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.

April 26, 2024April 23, 2024

CEO Interview with Clay Johnson of CacheQ Systems

CEO Interview with Clay Johnson of CacheQ Systems
by Daniel Nenni on 04-26-2024 at 6:00 am
Categories: CEO Interviews

Clay Johnson has decades of executive experience in computing, FPGAs and development flows, including serving as vice president of the Xilinx (now AMD) Spartan Business Unit. He has a vision to enable the next phase of computing.

Tell us about CacheQ.
CacheQ is a little over six years old and we have about 10 people. We focus on application acceleration to simplify development of data center and edge-computing applications executing on processors, GPUs and FPGAs. Our QCC Acceleration Platform reduces development time and increases performance, enabling application developers to implement heterogeneous compute solutions leveraging processors, GPUs and FPGAs with limited hardware architecture knowledge.

My business partner and I envisioned CacheQ’s ideas and technology about 15 years ago. At that point, we recognized the need for performance beyond clock rate improvements in CPUs.

We continue to develop our software technology while engaging with customers to solve complex performance and technical challenges. Our customers have applications that may run single threaded on a CPU and need higher performance achieved by running multithreaded on a CPU. For higher performance, we can target accelerators like GPUs.

Tell us a little bit about yourself.
My career has been around technology, both in the semiconductor space and development platforms associated with complex semiconductor technology. I started off in EDA development tools, schematic capture and simulation. I worked for Xilinx (now AMD) for a long time. The last position I had was vice president of the Spartan business unit, a well-known product family from Xilinx at the time. I managed that along with the auto sector and the aerospace and defense sector. I left to run a company that developed a secure microprocessor with a high degree of encryption, decryption, anti-tamper technology and key management technology. We sold that company and I was involved in another security company. Then my business partner, who I’ve worked with for decades, and I started CacheQ.

What problems is CacheQ solving?
CacheQ accelerates application software. CacheQ targets a range of platforms from the absolute highest performance GPUs to embedded systems. Each platform has their own unique performance requirements. With embedded systems, we can target monolithic platforms comprised of CPUs and an accelerator fabric. An example of that is Jetson from Nvidia. In addition to performance, there is a requirement to reduce the complexity of application acceleration. Our development platform obfuscates the most complex steps to deliver application acceleration.

What application areas are your strongest?
Our customers deal with complex numerical problems in the medical, computational physics, video or large government entities. Essentially, high-technology companies with software developers implementing applications that solve complex numerical problems. Our target customers are software developers who traditionally write high-level languages like C or C++.

While it’s not a specific application or a specific vertical space, our tools are used for numerically intensive type of applications that require a lot of compute power to execute. Examples are molecular analysis and encryption and decryption. Computational fluid dynamics is a big area that’s used across a bunch of industries.

CacheQ has done various projects in weather simulation. Weather simulation can be traditional weather simulation or tsunami code that projects what will happen when a tsunami hits.

What keeps your customers up at night?
Driving performance is a daunting challenge because of the various technical challenges to overcome while trying to accelerate an application, especially those that are unanticipated.

Application acceleration can be compute bound or memory bound. At times it is unclear what target hardware to use –– is the target a CPU or a GPU? For some cases in performance, the target could be an FPGA. Another question is whether there are vendors that offer better development platforms and tools?

In many cases, the code written by an application developer runs single threaded and may need to be restructured to get higher performance to run in parallel. Attempting to accelerate an application includes plenty of unknowns. For example, we have code that runs faster on Intel processors. In other cases, it runs faster on AMD processors. It’s a multi-dimensional complex problem that’s not easy to solve.

What does the competitive landscape look like and how do you differentiate?
We are not aware of any company that does what we do. Our largest obstacle is customers who stay with the existing solutions. It’s a challenging problem. Customers are familiar with techniques used that put pragmas or directives into their code. An example is acceleration. Nvidia offers something called CUDA. CUDA is used to write code that runs in Nvidia GPUs.

It’s a complex tool chain to understand the software and how that applies to hardware. It takes time and energy to figure it out and where our competition lies. Getting over the hump of traditional software development and looking at new technology. Most developers have not been able to solve the problems that we solve. When we explain our technology and its complexity, they are skeptics.

Once we show developers our technology and demonstrate it, we typically hear, “Wow, you can do that.” It’s the existing mindset and existing platforms, as opposed to any competitor who’s doing exactly what we’re doing.

What new features/technology are you working on?
We continue to push our existing technology forward, which is further optimization to produce better results on target platforms. As new technology comes out, we update our technology to support those new technologies. We can always improve the results of what we deliver on the existing platforms we support. We continually look at target GPUs from AMD and Nvidia. With codes that come in from customers, we’re able to run them through our tools, look at them and analyze the results that we get and continually drive the performance that we deliver from various applications.

Beyond that, our technology supports heterogeneous computing, the ability to look at various technologies and split the task across these technologies. For example, most top-end acceleration is done with PCI-attached accelerator cards. Some code runs on the CPU and some on the GPU. We figure out what code needs to run where. It’s heterogeneous. At the same time, the world is looking at machine learning (ML). It is where everything is going and it’s dynamic.

Companies are investing significant amounts of capital to develop ML solutions across silicon systems, software frameworks like PyTorch, cloud services and platforms, new models, operational models, new languages. It’s broad and deep. We have technology to target multiple hardware platforms. We remove the need to code for specific platforms or vendors. I previously mentioned CUDA, the standard to get acceleration from ML models by executing on GPUs and why NVIDIA dominates.

CUDA is specific to Nvidia GPUs. Developers can’t run CUDA code on different GPUs. Coding in CUDA is challenging and, at the same time, writing CUDA code links into executing on an Nvidia GPU. Our technology removes the need to write ML libraries in CUDA. Developers write standard high-level languages like C or C++ and target CPUs and GPUs from both Nvidia and AMD.

Combining that with technology that allows access to low-cost cloud resources enables an environment that reduces model development time, delivers performance and low-cost access to high-performance GPUs. Anyone in the ML space knows the downside of GPUs today is cost. The most recent GPUs from Nvidia are $30,000. Many companies cannot afford or access that kind of technology in the cloud.

The direction we’re taking our technology for ML offers standard languages without vendor-specific code along with a development platform that allows developers access to low-cost GPUs. Customers tell us that’s a huge advantage.

What was the most exciting high point of 2023 for CacheQ?

Our customer engagements, our technology development and recognition of how our technology could enable ML are high points. We believe there’s a huge opportunity and challenge around ML development platforms.

Companies need much better software development platforms that could encompass not just a single vendor but multi-vendors. There are a significant number of models being developed and those models need to run across a variety of different hardware platforms.

No development platform offers what we bring to the market. 2023 was a year where we were engaging with customers and solving their complex problems. At the same time, we were working on our ML strategy. Our overall strategy really came together in 2023.

What was the biggest challenge CacheQ faced in 2023?
Traditional compute acceleration is not a growth area. Compute technology is being used for ML. The opportunity in 2023-2024 and beyond is various technologies around ML. Transitioning from a compute-focused company to include an ML offering was our biggest challenge in 2023.

How do customers normally engage with your company?
To learn more, visit the CacheQ Systems website at www.cacheq.com or email info@cacheq.com.

Also Read:

CEO Interview: Khaled Maalej, VSORA Founder and CEO

CEO Interview with Ninad Huilgol of Innergy Systems

CEO Interview: Ganesh Verma, Founder and Director of MoogleLabs

April 25, 2024January 24, 2025

Alphawave Semi Bridges from Theory to Reality in Chiplet-Based AI

Alphawave Semi Bridges from Theory to Reality in Chiplet-Based AI
by Bernard Murphy on 04-25-2024 at 10:00 am
Categories: AI, Alphawave Semi, Chiplet, IP

GenAI, the most talked-about manifestation of AI these days, imposes two tough constraints on a hardware platform. First, it demands massive memory to serve large language model with billions of parameters. Feasible in principle for a processor plus big DRAM off-chip and perhaps for some inference applications but too slow and power-hungry for fast datacenter training applications. Second, GenAI cores are physically big, already running to reticle limits. Control, memory management, IO, and other logic must often go somewhere else though still be tightly connected for low latency. The solution of course is an implementation based on chiplets connected through an interposer in a single package: one or more for the AI core, HBM memory stacks, control, and other logic perhaps on one or more additional chiplets. All nice in principle but how do even hyperscalers with deep pockets make this work in practice? Alphawave Semi has already proven a very practical solution as I learned from a Mohit Gupta (SVP and GM of Custom Silicon and IP at Alphawave Semi) presentation, delivered at the recent MemCon event in Silicon Valley.

Start with connectivity

This and the next section are intimately related, but I have to start somewhere. Silicon connectivity (and compute) is what Alphawave Semi does: PCIe, CXL, UCIe, Ethernet, HBM; complete IP subsystems with controllers and PHYs integrated into chiplets and custom silicon.

Memory performance is critical. Training first requires memory for parameters (weights, activations, etc.) but it also must provide pre-allocated working memory to handle transformer calculations. If you once took (and remember) a linear algebra course, a big chunk of these calculations is devoted to lots and lots of matrix/vector multiplications. Big matrices and vectors. Working space needed for intermediate storage is significant; I have seen estimates running over 100GB (the latest version of Nvidia Grace Hopper reportedly includes over 140GB). This data must also move very quickly between HBM memory and/or IOs and the AI engine. Alphawave Semi support better than an aggregated (HBM/PCIe/Ethernet) terabyte/second bandwidth. For the HBM interface they provide memory management subsystem with an HBM controller and PHY in the SoC communicating with the HBM controller sitting at the base of each HBM memory stack, ensuring not only protocol compliance but also interoperability between memory subsystem and memory stack controllers.

Connectivity between chiplets is managed through Alphawave UCIe IP (protocol and PHY), delivering 24Gbps per data lane. These have already been proven in 3nm silicon. A major application for this connectivity might well be connecting the AI accelerator to an Arm Neoverse compute subsystem (CSS) charged with managing the interface between the AI world (networks, ONNX and the like) to the datacenter world (PyTorch, containers, Kubernetes and so on). Which conveniently segues into the next topic, Alphawave Semi’s partnership with Arm in the Total Design program and how to build these chiplet-based systems in practice.

The Arm partnership and building chiplet-based devices

We already know that custom many-core servers are taking off among hyperscalers. It shouldn’t be too surprising then that in the fast-moving world of AI, custom AI accelerators are also taking off. If you want to differentiate on a world-beating AI core you need to surround it with compute, communications, and memory infrastructure to squeeze maximum advantage out of that core. This seems to be exactly what is happening at Google (Axion and the TPU series), Microsoft (Maia), AWS (Tranium), and others. Since I don’t know of any other technology that can serve this class of devices, I assume these are all chiplet-based.

By design these custom systems use the very latest packaging technologies. Some aspects of design look rather like SoC design based on proven reusable elements, except that now those elements are chiplets rather than IPs. We’ve already seen the beginnings of chiplet activity around Arm Neoverse CSS subsystems as a compute front-end to an AI accelerator. Alphawave Semi can also serve this option, together with memory and IO subsystem chiplets and HBM chiplets. All the hyperscaler must supply is the AI engine (and software stack including ONNX or similar runtime).

What about the UCIe interoperability problem I raised in an earlier blog? One way to mitigate this problem is to use the same UCIe IP throughout the system. Which Alphawave Semi can do because they offer custom silicon implementation capabilities to build these monsters, from design through fab, OSAT and delivering tested, packaged parts. And they have established relationships with EDA and IP vendors and foundry and OSAT partners, for example with TSMC on CoWoS and InFO_oS packaging.

The cherry on this cake is that Alphawave Semi is also a founding member with Arm on the Total Design program and can already boast multiple Arm-based SoCs in production. As proof, they already can claim a 5nm AI accelerator system with 4 accelerator die and 8 HBM3e stacks, a 3nm Neoverse-based system with 2 compute die and 4 HBM3e stacks, and a big AI accelerator chip with one reticle-size accelerator plus HBM3e/112G/PCIe Subsystem and 6 HBM3e stacks. Alphawave also offers custom silicon implementation for conventional (no-chiplet) SoCs.

Looks like Alphawave Semi is on the forefront of a new generation of semiconductor enterprises, serving high-performance AI infrastructure for systems teams who demand the very latest in IP, connectivity, and packaging technology (and are willing to spend whatever it takes). I have also noticed a few other semis also taking this path. Very interesting! If you want to learn more click HERE.

Also Read:

The Data Crisis is Unfolding – Are We Ready?

Accelerate AI Performance with 9G+ HBM3 System Solutions

Alphawave Semiconductor Powering Progress