Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/the-great-ai-silicon-shortage.24737/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030970
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

The Great AI Silicon Shortage

user nl

Well-known member

The Compute Shortage​

Token demand is skyrocketing and the need for AI compute continues to accelerate. The improvement in model capabilities combined with the rapid emergence of agentic workflows has driven a surge in user adoption and aggregate token demand. Anthropic added a staggering $6B of ARR in the single month of February alone driven by broad adoption of agentic coding platform Claude Code, and if Anthropic had more compute they would have added more. Despite a huge AI infrastructure buildout over the past few years, available compute is scarce. On-demand GPU prices continue to go up even for Hoppers which are almost 2 generations old.

From our own experiences, we have reached out to every neocloud we know asking if they have small clusters available, but everything is already firmly locked up. This tight supply environment explains the sharp reset in hyperscaler capex plans. Consensus estimates have moved materially higher across the board, with Google standing out as the most extreme example, where 2026 capex expectations have roughly doubled versus prior expectations, primarily driven by datacenter and server spend.

 
I find curious the two charts on TSMC N3 wafer capacity, and N3 wafers shipped: They show N3 wafer capacity decreasing QoQ in Q4 2024 through Q3 2025, but N3 wafers shipped somehow increases quarterly each time.

..

That aside, I found it insightful that switching memory DRAM to HBM has a 3:1/4:1 impact on wafer used per bit.
 
the capex forecast tells it all. That is why we have a shortage.

Reminder... HBM isnt the only thing short and DDR5 prices and margins have exploaded. MOST of Memory company increase in revenue is coming from DDR5 price increases, not just HBM
 
There is also today a very long youtube interview/Dwarkesh Podcast with the main author, Dylan Patel, on this topic.
It seems like Dylan Patel is suggesting that the availability of ASML EUV tools is becoming the next bottleneck for the AI buildout (starts around 34 minutes):


Mar 13, 2026 Dwarkesh Podcast

Dylan Patel, founder of SemiAnalysis, provides a deep dive into the 3 big bottlenecks to scaling AI compute: logic, memory, and power.And walks through the economics of labs, hyperscalers, foundries, and fab equipment manufacturers.Learned a ton about every single level of the stack. Enjoy!

 
Last edited:
Here is a 500 word AI summary:

The Global Race for AI Compute: Infrastructure, Semiconductors, and the Bottlenecks Ahead (≈500 words)​

The rapid expansion of artificial intelligence is driving an unprecedented surge in global demand for computing infrastructure. As discussed in a recent conversation between Dwarkesh Patel and Dylan Patel, the scale of investment in AI infrastructure has reached historic levels. Major technology companies—including Amazon, Meta, Google, and Microsoft—are projected to spend hundreds of billions of dollars on capital expenditures related to AI data centers, chips, and power infrastructure. These investments highlight a central reality of the modern AI race: the primary constraint is no longer software innovation alone, but the physical infrastructure required to run large-scale AI systems.

One of the key insights from the discussion is that AI compute capacity scales on timelines much longer than software development cycles. Large technology firms are not simply purchasing servers or GPUs for immediate use. Instead, a significant portion of their capital expenditures is allocated toward long-term infrastructure projects, such as building data centers, securing power generation capacity, and pre-ordering semiconductor manufacturing capacity years in advance. For example, companies often place deposits on gas turbines or long-term power purchasing agreements several years before the corresponding compute infrastructure becomes operational. As a result, the massive spending figures seen today reflect investments that will come online gradually throughout the decade.

The compute demands of leading AI laboratories further illustrate the scale of the challenge. Companies such as OpenAI and Anthropic already operate clusters measured in gigawatts of power consumption. A single gigawatt-scale AI data center can require tens of billions of dollars in infrastructure and hardware investment. As AI models grow larger and more widely deployed, these labs must continuously expand their compute capacity not only to train new models but also to serve inference workloads for millions of users. Consequently, much of the capital raised by AI labs is dedicated to securing long-term compute access rather than immediate operational costs.

However, expanding compute infrastructure is constrained by several bottlenecks across the semiconductor supply chain. The most critical components include advanced logic chips, high-bandwidth memory (HBM), and the manufacturing equipment used to produce them. Companies such as Nvidia dominate the market for AI accelerators, while the fabrication of advanced chips is concentrated at manufacturers like TSMC. Even further upstream, the production of lithography equipment by ASML ultimately determines the maximum number of advanced chips that can be produced globally.

The supply chain complexity is immense. For instance, producing a gigawatt of cutting-edge AI chips requires tens of thousands of advanced semiconductor wafers and millions of lithography process steps. Each step depends on specialized equipment with long production lead times, making rapid scaling extremely difficult. As a result, even if data centers and power generation can expand quickly, the semiconductor manufacturing ecosystem may still limit overall compute growth.

In addition to chip manufacturing, memory production has emerged as another major constraint. High-bandwidth memory, which enables AI accelerators to process massive datasets efficiently, is significantly more resource-intensive to manufacture than conventional memory. As AI demand rises, memory manufacturers are redirecting production capacity away from consumer electronics and toward AI hardware, potentially increasing prices for devices such as smartphones and laptops.

Ultimately, the expansion of AI infrastructure depends on a complex interplay between technological innovation, supply chain capacity, and global economic investment. While software breakthroughs remain essential, the next phase of AI development will increasingly be determined by the ability to scale physical infrastructure—from semiconductor fabs to power generation—to support the immense computational demands of advanced AI systems.
 
Here is a 500 word AI summary:

The Global Race for AI Compute: Infrastructure, Semiconductors, and the Bottlenecks Ahead (≈500 words)​

.........................................................

As AI demand rises, memory manufacturers are redirecting production capacity away from consumer electronics and toward AI hardware, potentially increasing prices for devices such as smartphones and laptops.

Ultimately, the expansion of AI infrastructure depends on a complex interplay between technological innovation, supply chain capacity, and global economic investment. While software breakthroughs remain essential, the next phase of AI development will increasingly be determined by the ability to scale physical infrastructure—from semiconductor fabs to power generation—to support the immense computational demands of advanced AI systems.

Nice general AI-summary of the 2-hour podcast. Still, I found it very interesting to listen to all the reasoning, arguments and details Dylan Patel gave (and sometimes not gave, as you have to pay for them!) about all the supply line issues and manufacturing issues in the semi Foundry world, in relation to scaling AI. Dylan seems to love numbers!

One of them is that Apple seems to be really becoming a much less important customer for TSMC relative to the AI/HPC customers. And with that diminishing position they may get fewer privileges with TSMC regarding capacity. It seems that the phone business market is preparing for a significant reduction in units to be shipped, especially in the low and medium segment the coming two years. And the price of the iPhone will go up.

Also interesting how he discusses Elon's plans for building your own fabs for "1 million wspm" and that Patel doesn't believe AI datacentres will move to the sky in the foreseable future (next decade).

Let's see when the new fab shells are ready in 2028/2029 and if ASML can/will ship some 100 EUV tools (or more) around 2030, and whether China will have its first EUV alpha tool by 2030. Interesting times and predictions.

1773493500274.webp
 
Last edited:
Nice general AI-summary of the 2-hour podcast. Still, I found it very interesting to listen to all the reasoning, arguments and details Dylan Patel gave (and sometimes not gave, as you have to pay for them!) about all the supply line issues and manufacturing issues in the semi Foundry world, in relation to scaling AI. Dylan seems to love numbers!

Let me ask you this, how do you view Dylan Patel's credibility as a semiconductor industry expert?
 
Back
Top