Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/jensen-huang-%E2%80%93-will-nvidia%E2%80%99s-moat-persist-podcast-with-dwarkesh-patel.25014/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2031070
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

Jensen Huang – Will Nvidia’s moat persist? Podcast with Dwarkesh Patel

user nl

Well-known member
712,805 views Apr 15, 2026 Dwarkesh Podcast

I asked Jensen about TPU competition, Nvidia’s lock on the ever more bottlenecked supply chain needed to make advanced chips, whether we should be selling AI chips to China, why Nvidia doesn’t just become a hyperscaler, how it makes its investments, and much more. Enjoy!+𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒
𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒
  • Crusoe's cloud runs on state-of-the-art Blackwell GPUs, with Vera Rubin deployment scheduled for later this year. But hardware is only part of the story—for inference, Crusoe's MemoryAlloy tech implements a cluster-wide KV cache, delivering up to 10x faster TTFT and 5x better throughput than vLLM. Learn more at https://crusoe.ai/dwarkesh
  • Cursor helped me build an AI co-researcher over the course of a weekend. Now I have an AI agent that I can collaborate with in Google Docs via inline comment threads! And while other agentic coding tools feel like a total black-box, Cursor let me stay on top of the full implementation. You can try my co-researcher out at https://github.com/dwarkeshsp/ai_cowo..., or get started on your own Cursor project today at https://cursor.com/dwarkesh
  • Jane Street spent ~20,000 GPU hours training backdoors into 3 different language models, then challenged my audience to find the triggers. They received some clever solutions—like comparing the base and fine-tuned versions and extrapolating any differences to reveal the hidden backdoor—but no one was able to solve all 3. So if open problems like this excite you, Jane Street is hiring. Learn more at https://janestreet.com/dwarkesh
To sponsor a future episode, visit https://dwarkesh.com/advertise.

𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒
00:00:00 – Is Nvidia’s biggest moat its grip on scarce supply chains?
00:16:25 – Will TPUs break Nvidia’s hold on AI compute?
00:41:06 – Why doesn’t Nvidia become a hyperscaler?
00:57:36 – Should we be selling AI chips to China?
01:35:06 – Why doesn’t Nvidia make multiple different chip architectures?

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Nvidia CEO Jensen Huang clarified in an April 2026 interview with Silicon Valley podcast host Dwarkesh Patel that the company allocates GPUs based on a first-come, first-served principle rather than a highest bidder wins approach.

Huang explained that Nvidia prioritizes GPU distribution by evaluating customers' demand forecasts and purchase orders (POs), then considers whether data center infrastructure is ready before allocating units according to order timing. Customers without completed infrastructure may receive lower priority to maximize overall production efficiency.

Addressing rumors of price-based allocation, Huang firmly denied any such practice, stating Nvidia's quoted prices are final and do not increase due to rising demand.

He stated that the company aims to be a reliable foundational supplier for the industry, capable of fulfilling massive AI infrastructure orders with stable commitments.
 
The back and forth regarding selling chips to China has generated a lot of discussion. I think Dwarkesh, as a 25 years old I believe, did a great job to calmly challenge Jensen's spin.
 
My take on the current moats:
* Gaming / Professional Graphics - limited chip capacity - AMD would rather spend limited allocation on AI and x86, Intel on-chip still no substitute for discrete hardware.
* HPC/other scientific, etc. - CUDA, associated libraries and ecosystem
* AI Training - market growing more slowly than inference, entrenched market share.
* AI Inference - rack/pod-level co-optimization of GPU/LPU/memory/storage/interconnect/rack HW/models/software stack and associated open source ecosystem, plus supply chain

Would love to hear other people's thoughts.
 
My take on the current moats:
* Gaming / Professional Graphics - limited chip capacity - AMD would rather spend limited allocation on AI and x86, Intel on-chip still no substitute for discrete hardware.
* HPC/other scientific, etc. - CUDA, associated libraries and ecosystem
* AI Training - market growing more slowly than inference, entrenched market share.
* AI Inference - rack/pod-level co-optimization of GPU/LPU/memory/storage/interconnect/rack HW/models/software stack and associated open source ecosystem, plus supply chain

Would love to hear other people's thoughts.

Professional - generally agreed, though I think this moat is drying up slowly here for Nvidia. x86 iGPUs continue to grow in capability, and Apple's unified memory architecture is pulling a number of traditional Professional Graphics applications users over to Mac.

For Gaming - I don't see this as a real moat anymore. The total # of discrete GPUs sold has been steadily decreasing for a long time, and there are far more people playing mobile games than PC games. Nvidia has only Nintendo as a console customer, while AMD has Sony, Microsoft, the SteamDeck, and various other handhelds. Both Intel and AMD also have pretty strong iGPUs now that compete up to the 50 and even 60 series Nvidia GPUs in some cases.

HPC/Scientific - agreed fully.

AI Training and Inference - My gut tells me the moat here is only short term. The Chinese are building their own entire stack for this, and hypercloud providers and even some smaller cloud providers are building their own full-stack solutions here. Amazon, and Tesla/SpaceX for example. Nvidia will probably remain the gold standard for a long time but I don't think they have a monopoly-style moat here in 2030.
 
even some smaller cloud providers are building their own full-stack solutions here. Amazon, and Tesla/SpaceX for example.
A lot will depend on cost per token vs interactivity profile of different suppliers plus effectiveness of agents. If NVIDIA gets to a place where they offer the best Pareto curves for all of the leading models, even with their margins, thanks to economies of scale (5-10 different specialized custom chip/systems per generation at a 1 year cadence, leveraging the most efficient rack & interconnect, plus the most efficient model/software stack), they become like TSMC. They may already be there - Nobody is showing better Pareto curves in data center scale benchmarks.

Amazon's Pareto frontier with Trainium 3 doesn't come close (big latency issues, hence Cerebras), and their interconnect has been too focused on standard CPU connectivity. Google TPU is probably closer, especially with their split this next generation between 8t and 8i. We'll see if Tesla/SpaceX ever becomes serious, but AI6 isn't an anywhere near a solution for data center. Who knows with China - the usually economics don't apply.
 
A lot will depend on cost per token vs interactivity profile of different suppliers plus effectiveness of agents. If NVIDIA gets to a place where they offer the best Pareto curves for all of the leading models, even with their margins, thanks to economies of scale (5-10 different specialized custom chip/systems per generation at a 1 year cadence, leveraging the most efficient rack & interconnect, plus the most efficient model/software stack), they become like TSMC. They may already be there - Nobody is showing better Pareto curves in data center scale benchmarks.

Amazon's Pareto frontier with Trainium 3 doesn't come close (big latency issues, hence Cerebras), and their interconnect has been too focused on standard CPU connectivity. Google TPU is probably closer, especially with their split this next generation between 8t and 8i. We'll see if Tesla/SpaceX ever becomes serious, but AI6 isn't an anywhere near a solution for data center. Who knows with China - the usually economics don't apply.

The other curveball is - what's needed for peak AI inference performance keeps changing. FP vs. Integer, precision levels, and even techniques for 'thinking' and 'MoE' have somewhat different requirements for compute. Nvidia definitelty has a leg up, but like any technology - there's only so many novel techniques before diminshing returns. The same fate of CPUs (whose exponential growth story was replaced by GPUs), could happen to Nvidia as AIs needs crystalize.

Another point on Amazon, Chinese, and SpaceX/Tesla is that they're chipping away at Nvidia's TAM by making their own products. That reduces Nvidia's economies of scale, and when combined with above -- the moat begins to dry.
 
what's needed for peak AI inference performance keeps changing. FP vs. Integer, precision levels, and even techniques for 'thinking' and 'MoE' have somewhat different requirements for compute.
Absolutely agree - and we might be getting close to the limits (demising returns as you suggest) of how much can be optimized via quantization, and some of the other basics used today.

The same fate of CPUs (whose exponential growth story was replaced by GPUs), could happen to Nvidia as AIs needs crystalize.
It seems like NVIDIA is front and center at scale as those needs crystallize with just about every end-customer. From what I can tell, they have adapted far faster than any of the other players when it comes to new generations of models. I was very impressed to see how fast and far they optimized results on DeepSeek-V4 within day 0 of the models release. Huawei had a two month head start and yet NVIDA is producing cheaper tokens, faster than Huawei now. And with a 1 year chip cadence and deep early knowledge of virtually every model, it looks like they can innovate ahead of others who are limited by the cost of tapeouts and system infrastructure.

Another point on Amazon, Chinese, and SpaceX/Tesla is that they're chipping away at Nvidia's TAM by making their own products.
IDK - 1 Trillion dollars worth of orders seems like more than enough to fund NVIDIA's scale. And right now, China is both logic and memory constrained, Amazon is buying plenty of GB200 NVL72s and Tesla/SpaceX just shut down in-house Dojo 3 in favor of NVIDIA for Colossus 1 and Colossus 2. I don't see an easy path back from that with them being mostly focused on client-side inference.
 
NVIDA is producing cheaper tokens, faster than Huawei now.
Maybe I'm drinking too much of the bathwater, but this article postulates that both NVIDIA and TSMC are strategically underpricing the value of their next generation product vs. previous, to maintain scale and avoid monopolist traps. In NVIDIA's case, TCO per marketed FP8 PFLOP ($ per hour / PFLOP) for Vera Rubin is priced 40% below market, despite supply limitations.

 
Maybe I'm drinking too much of the bathwater, but this article postulates that both NVIDIA and TSMC are strategically underpricing the value of their next generation product vs. previous, to maintain scale and avoid monopolist traps. In NVIDIA's case, TCO per marketed FP8 PFLOP ($ per hour / PFLOP) for Vera Rubin is priced 40% below market, despite supply limitations.

Hmm. Nvidia probably has no choice because the AI companies aren't making money, and the biggest cost is the perf/watt of the datacenter hardware. More revenue now could mean the whole thing goes bust later.

TSMC - Margins have gone up a lot in the last few years indicating a high premium for leading edge nodes. They're probably getting pushback from customers on cost/transistor, not to mention they need to starve Intel of customers at all costs for their long term good.

I would be surprised if TSMC N2 and A14 are substantially better cost per transistor than N3, based on their increasing profit margins and the decreasing returns on density.
 
Hmm. Nvidia probably has no choice because the AI companies aren't making money, and the biggest cost is the perf/watt of the datacenter hardware. More revenue now could mean the whole thing goes bust later.
We're at a weird place where AI labs are show strong operating profits with growing gross margins (gross margins 40 -> 80% on inference revenue) but massive net losses from capex-heavy training and future-capacity infra, akin to most high-growth tech where opex generates cash while investments dominate P&L. In the midst of the financially risky supply-constrained buildout, NVIDIA and TSMC are offering future products in a price stack that substantially lowers forward going opex for the labs and the neoclouds supplying token capacity for them. In my books, it's not clear that competitors offer better TCO solutions - they just offer alternate capacity today in a wafer and memory constrained market.
 
We're at a weird place where AI labs are show strong operating profits with growing gross margins (gross margins 40 -> 80% on inference revenue) but massive net losses from capex-heavy training and future-capacity infra, akin to most high-growth tech where opex generates cash while investments dominate P&L. In the midst of the financially risky supply-constrained buildout, NVIDIA and TSMC are offering future products in a price stack that substantially lowers forward going opex for the labs and the neoclouds supplying token capacity for them. In my books, it's not clear that competitors offer better TCO solutions - they just offer alternate capacity today in a wafer and memory constrained market.
Interesting - I didn't realize the inference side of the business was actually even above breaking even. The last time I deeply looked into it (probably 12 months ago to be fair) - the cost to end users of inference was less than just the cost of electricity to run the GPUs alone.
 
The last time I deeply looked into it (probably 12 months ago to be fair) - the cost to end users of inference was less than just the cost of electricity to run the GPUs alone.
I thought that too, until I saw the linked article and heard some of the recent murmurs about these companies approaching IPOs. I think three things have changed since last year:
1) Their revenue has really turned on and offerings have gotten much more valuable to enterprises.
2) Both data-center scale training and inference have gotten far more efficient thanks to co-optimization both from the labs and from hardware suppliers, so COGs/user are down. Remember how concerned everyone was last year that DeepSeek's optimizations were going to kill the hardware market. Turns out those kinds of optimizations were good for the business.
3) Their hardware utilization is much heavier on revenue-producing inference, rather than training. They are starting to benefit from economies of scale.

There's still a lot of risk in estimating long term market demand, but with revenues growing dramatically and margins expanding, it's hard not to plan aggressively, especially with everyone seemingly jumping in to share the risk.
 

Attachments

  • Screenshot 2026-05-01 at 11.52.40 AM.png
    Screenshot 2026-05-01 at 11.52.40 AM.png
    534.7 KB · Views: 13
Back
Top