Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/analysis-ai-will-require-2t-in-annual-revenue-to-support-500b-in-planned-capex.23676/page-2
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030770
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

Analysis: AI will require $2T in annual revenue to support $500B in planned CapEx

what is the replacement for CUDA? isn't CUDA is part of Nvidia's moat?
You can think of CUDA roughly as an instruction set (that includes GPU, TPU, other AI and memory management operations), a set of hardware architectures that execute that instruction set, plus layers of software (like compilers) and libraries on top that enable all kinds of accelerated parallel computing. CUDA enables acceleration libraries for everything from gene sequencing, to managing massive data tables to inverse lithography (resolution enhancement), graphics with ray tracing and intraframe generation, etc. in addition to AI. CUDA keeps expanding, adding new acceleration and data representations, like the new 4 bit data representation, plus adding new applications. It's a moat because it enables so many uses / libraries (broad) and because it keep optimizing for AI (deep). Other companies doing AI inference have built out their architectures and software sufficiently to support the next level of open AI model standards (TensorFlow, PyTorch) and inference servers (SGLang, vLLM). AMD has their ROCm, though not as complete or broadly supported, but sufficient to support the model frameworks and model serving.

In my mind there are new moats ahead when it comes to building out efficient data centers that involve ultrafast communication and memory usage coordination so that an entire data center with racks and racks of AI chips can be managed as a single resource, configuring clusters of chips to behave as a single model processor, with lots of specialized model serving capabilities. Think this is where things are going to:

 
Last edited:
People have been saying this for some years now and it seems to make obvious sense. But that hasn't stopped nVidia dominance. Or is that just the training side ? So why hasn't that happened ? Is it simply the ecosystem (CUDA etc) constraints ?

Jensen Huang responded to a question during a forum’s Q&A session about whether ASICs will replace GPUs in the AI movement. He explained that AI software and hardware are advancing so quickly that ASICs cannot keep up with the flexibility offered by GPUs.
 
Jensen Huang responded to a question during a forum’s Q&A session about whether ASICs will replace GPUs in the AI movement. He explained that AI software and hardware are advancing so quickly that ASICs cannot keep up with the flexibility offered by GPUs.
That seems plausible short-medium term, I guess long-term Intel and Nvidia could design and fab ASICs..
 
Jensen Huang responded to a question during a forum’s Q&A session about whether ASICs will replace GPUs in the AI movement. He explained that AI software and hardware are advancing so quickly that ASICs cannot keep up with the flexibility offered by GPUs.
What else would he say? When the field of customers narrows down to half a dozen, each of them can attain economy of scale a lot easier.

Just as the Anthropic CEO saying AI will obsolete 25% of the workforce, or when Zuckerberg said AI is now doing 30% of the coding (He didn't say whether it is the first 30% or the last 30%!😄). Maybe all true, probably all true, but still, these folks also have a bit of vested interest!!
 
That seems plausible short-medium term, I guess long-term Intel and Nvidia could design and fab ASICs..
I would be interested in whether any of you see the Groq, Etched or Cerebras "chips" as ASICs ? Or what about the Rubin CPX chip" that is specifically design for just the long context prefill part of inference ? I see all of those as ASICs, albeit programmable ASICs.


ps: I use the term "chip" very loosely because all of them are actually multi-die systems packed with memory.
 
Last edited:
A guy from Meta keynoted at OIP today. He has been there for 14 years building datacenters. I will try and get my notes into a blog but the most memorable thing he said was that by the time they finish a multi billion dollar datacenter it is already obsolete and there are no upgrades, they have to build another due to changing power and cooling requirements. Sounds a lot like the bitcoin mining craze in the 2010s. TSMC made a bunch of that one as well but AI is much larger in scale.
 
by the time they finish a multi billion dollar datacenter it is already obsolete and there are no upgrades, they have to build another due to changing power and cooling requirements.
Also sounds like fabs. Data centers used to be able to do density retrofits, but no longer. I wonder if hyperscalers will figure out how to leverage fully depreciated data centers. Not quite as expensive as a leading edge fab just yet, but close.

 
Also sounds like fabs. Data centers used to be able to do density retrofits, but no longer. I wonder if hyperscalers will figure out how to leverage fully depreciated data centers. Not quite as expensive as a leading edge fab just yet, but close.


The other interesting thing he did was an overlay of their latest mega datacenter over Manhattan. Seriously, it is the size of Manhattan. Crazy. It cost $50B, 4M sqft, and will go online in 2026. AI datacenters are truly an arms race.
 
Sam Altman’s AI empire will devour as much power as New York City and San Diego combined. Experts say it’s ‘scary’
This week, OpenAI announced a plan with Nvidia to build AI data centers consuming up to 10 gigawatts of power, with additional projects totaling 17 gigawatts already in motion. That’s roughly equivalent to powering New York City—which uses 10 gigawatts in the summer—and San Diego during the intense heat wave of 2024, when more than five gigawatts were used. Or, as one expert put it, it’s close to the total electricity demand of Switzerland and Portugal combined.

 
Sam Altman’s AI empire will devour as much power as New York City and San Diego combined. Experts say it’s ‘scary’
This week, OpenAI announced a plan with Nvidia to build AI data centers consuming up to 10 gigawatts of power, with additional projects totaling 17 gigawatts already in motion. That’s roughly equivalent to powering New York City—which uses 10 gigawatts in the summer—and San Diego during the intense heat wave of 2024, when more than five gigawatts were used. Or, as one expert put it, it’s close to the total electricity demand of Switzerland and Portugal combined.


Although large AI data centers do consume a lot of electricity, I think this Fortune article by Eva Roytburg is misleading. The OpenAI/Nvidia AI data centers will have a combined peak demand of 10 gigawatts, not continuous usage. No data center, whether existing or future, operates at peak demand all the time or even reaches it frequently. If it did, it would essentially trigger repeated overload shutdowns.

In addition, this figure represents multiple data centers across the United States, not a single site. And without considering the capacity of the power grid, the analysis is much less meaningful. For example:
  • 1. Texas: The state’s total generation capacity is over 145 gigawatts, while its highest peak demand was 85.5 gigawatts in 2023. Both numbers are expected to grow in the coming years.

  • 2. New York: The 2025 summer peak demand is forecast at 31.47 gigawatts, while the state’s grid capacity stands at 40.94 gigawatts.
 
I would be interested in whether any of you see the Grok, Etched or Cerebras "chips" as ASICs ? Or what about the Rubin CPX chip" that is specifically design for just the long context prefill part of inference ? I see all of those as ASICs, albeit programmable ASICs.


ps: I use the term "chip" very loosely because all of them are actually multi-die systems packed with memory.
The Cerebras WSE-3 is not multi-die, it is a single die, and wafer-scale. It is a 900,000+ core processor. The cores are considered Turing-complete, though I don't remember Cerebras ever saying that. It is and it isn't an ASIC, I suppose.

The Groq LPU (Language Processing Unit) is a specialized processor with characteristics of computational memory, and a highly integrated compiler which creates custom code to define the processing flow through the LPU. These types of compilers often sarcastically called "magic compilers", because they make or break the effectiveness of the processor design. The technique is also used by VLIW processors. In the LPU, the claim is that the compiler defines the processing flow through the LPU functional units. I've seen no mention of whether or not the processors are Turing-complete.

Nvidia CUDA cores are said to be Turing-complete. (I've never seen a detailed analysis of that.) Tensor cores and Ray Tracing cores are apparently not Turing-complete, which is no surprise.

The Etched Sohu is called an ASIC by Etched, and I agree with them. It is designed just for running transformer neural networks.

This whole discussion of processors versus ASICs was started by someone in the press (I don't remember who) that wanted to differentiate the Google and AWS AI chips from Nvidia GPUs. IMHO, it was an ignorant and dumb classification of chip functionality and design by someone who had no idea what he or she was talking about. What a surprise. I suggest we drop it.
 
Unlikely. Huang just gave a non-answer to a silly question.
From what I have seen with local models at least, they do keep adding new instructions to meet different thinking on how to make models run fast and efficient. Bandwidth needs keep changing too. I think once the early curve of "how to AI" settles then the business case for ASICs gets a lot stronger vs GPUs (that are high volume keeping costs down).
 
I think once the early curve of "how to AI" settles then the business case for ASICs gets a lot stronger vs GPUs (that are high volume keeping costs down).
If you mean by "how to AI" refers to which strategy and methods to use for AI (e.g. LLMs and transformers), I agree, we are very early in the most appropriate processing models. (At least I hope so!) Do I believe that classic accelerator ASICs with application-specialized circuit designs will emerge for AI? (Beyond matrix math blocks and vector processing.) I'm still in the "I doubt it" stage, but at least I'm open-minded enough to know I may very well be wrong.
 
This whole discussion of processors versus ASICs was started by someone in the press (I don't remember who) that wanted to differentiate the Google and AWS AI chips from Nvidia GPUs. IMHO, it was an ignorant and dumb classification of chip functionality and design by someone who had no idea what he or she was talking about. What a surprise. I suggest we drop it.
I agree - the whole notion of fixed ASICs is kind of out the window for AI. Appreciate your thoughts on categorization, though I think there are also additional dimensions of memory architecture, built-in representations/arithmetic, and interconnect between different types of lower level processors. My main point though, is that terms like GPU and TPU are increasingly becoming meaningless as the chips and systems shape themselves to into data center level building blocks that optimize inference (mostly) of huge and continuously changing models. Etched is a maybe a “transformer processor”. Cerebras has shaped itself into a “general purpose AI processor” because it can do training and inference. Rubin CPX is a “transformer context/prefill processor”.

And like you, I’m not sure we’ll get to a point where “how to AI” stabilizes enough for highly fixed accelerator chips in data centers, though we’ll probably see companies develop edge-oriented good-enough application-specific models compatible NPUs (or whatever we want to call them) for things like real-time audio/text translation (hello Google and Apple), and conversational front-ends, that can do relatively smart things in local apps.
 
I agree - the whole notion of fixed ASICs is kind of out the window for AI.
And I'm probably being too literal and difficult for my own good. (Numerous friends reading that will probably chuckle at my tendency for understatement.)

When I think of accelerator ASICs, the classic example that comes to my mind is the original Google video transcoder ASIC. I doubt YouTube would be as profitable and successful without it. If Google was still doing transcoding on CPUs their costs would much higher. I understand Meta later did a transcoder ASIC too. ASICs can change the world for a business, but I often wish the semiconductor dummies in the financial press would just shut up.
 
When I think of accelerator ASICs, the classic example that comes to my mind is the original Google video transcoder ASIC. I doubt YouTube would be as profitable and successful without it. If Google was still doing transcoding on CPUs their costs would much higher.
Interesting that you mention dedicated HW for transcoding. Rubin CPX integrates four NVENC (encoders) and four NVDEC (decoders) to accelerate generative video and other video-intensive AI workloads. This is because the chip was specifically designed as the front-end (pre-fill) for massive-context inference tasks, including long-form video processing. I was focused on inference before, but now that I think about it, this should make Rubin CPX especially efficient at video and other long-context (coding) training, as well.
 
Last edited:
And like you, I’m not sure we’ll get to a point where “how to AI” stabilizes enough for highly fixed accelerator chips in data centers, though we’ll probably see companies develop edge-oriented good-enough application-specific models compatible NPUs (or whatever we want to call them) for things like real-time audio/text translation (hello Google and Apple), and conversational front-ends, that can do relatively smart things in local apps.

Do you consider Tesla's original Dojo custom chips (D1) to be ASICs or GPUs? Dojo "1", and the future Dojo (based on AI6 coming end of decade) will be custom chips made by Tesla/X and used by both Tesla and xAI.

Broadcom was/is working on ASICs for ByteDance's AI needs as of last year at least: https://www.reuters.com/technology/...elop-advanced-ai-chip-sources-say-2024-06-24/
 
Do you consider Tesla's original Dojo custom chips (D1) to be ASICs or GPUs? Dojo "1", and the future Dojo (based on AI6 coming end of decade) will be custom chips made by Tesla/X and used by both Tesla and xAI.

Broadcom was/is working on ASICs for ByteDance's AI needs as of last year at least: https://www.reuters.com/technology/...elop-advanced-ai-chip-sources-say-2024-06-24/
I think that classification system is broken for AI. Tesla Dojo 1 and 2 look kind of TPU-ish with some of their own custom data representations (CFP8) and were focused on data center training. But NVIDIA outpaced the smallish Tesla team in training on both the HW and SW side. AI6 is a different bird - a chip designed and optimized for edge-based inference, and a very special kind of inference at that - continuous vision-based decision-making, probably also applicable to robots.

As for ByteDance's ASICs, who knows given all embargo on tech to China ?

In Jan 2025, they were planning a mix of buys for domestic and NVIDIA for external.

Their chip design team just got assigned to Singapore which tells me that they want to use TSMC or maybe that they are trying to move outside China to meet the Tik Tok negotiation requirements. Who knows...

UPDATE: Based on the most recent news, ByteDance / TikTok may be forced to "Buy American". Who knows ?

 
Last edited:
The Cerebras WSE-3 is not multi-die, it is a single die, and wafer-scale. It is a 900,000+ core processor. The cores are considered Turing-complete, though I don't remember Cerebras ever saying that. It is and it isn't an ASIC, I suppose.
BTW - I shouldn't leave out Cerebras. They are building out data centers as well, though not at the scale of Microsoft or Oracle/OpenAI. But it shows that data center chip companies really have design the chips and networking for complete data center level design if they real want to understand real workloads and optimize for those.

Cerebras opens 10MW data center in Oklahoma City​

Previous reports suggest Scale-owned facility hosts more than 300 CS-3 systems



 
Back
Top