SemiAnalysis: Nvidia – The Inference Kingdom Expands

user nl · Mar 24, 2026

Very detailed and informative analysis uploaded today by the team at SemiAnalysis:

Nvidia – The Inference Kingdom Expands

Groq LP30, LPX Rack, Attention FFN Disaggregation, Oberon & Kyber Updates, Nvidia's CPO Roadmap, Vera ETL256, CMX & STX

Dylan Patel, Myron Xie, Daniel Nishball, and 7 others
Mar 24, 2026

At GTC 2026, Nvidia delivered an event packed full of ground breaking announcements. Nvidia’s pace of innovation is not showing any signs of slowing, as they introduced three entirely new systems this year: Groq LPX, Vera ETL256, and STX. Also announced were updates to Nvidia’s Kyber rack architecture system, CPO making its debut for scale-up networking with the unveiling of the Rubin Ultra NVL576 and Feynman NVL1152 multi-rack systems. Early hints on Feynman’s architecture was also a key topic. A Jensen callout for InferenceX during the keynote was a highlight.

This is our GTC 2026 recap, and we will address many of the key questions that have been left unanswered by Nvidia. Specifically, we will go through the LPX rack and LP30 chip and explain how attention and feed forward network disaggregation (AFD) works; more details on the various rack architectures behind NVL144, NVL576, and NVL1152 and clarify just how much optics will be inserted as well as the rationale behind the dense Vera ETL256. The next generation Kyber rack had some big updates and some hidden details.

https://newsletter.semianalysis.com/p/nvidia-the-inference-kingdom-expands

KevinK · Mar 24, 2026

user nl said:
Specifically, we will go through the LPX rack and LP30 chip and explain how attention and feed forward network disaggregation (AFD) works;

Pretty amazing to see how quickly NVIDIA was able to add a new specialized AI processor from the outside into their co-optimized data center AI inference systems. How fast we have gone from raw, brute-force model handling via massive memory and parallelization to prefill/decode disaggregation with shared KV stores (and associated storage hierarchy), to adding a new form of disaggregation, AFD, along with a new processor type to reduce latency. Makes me wonder whether we’re going to see a world where new specialized, optimized acceleration processors, connectivity, and storage can be dropped in under orchestration control of a data-center OS that comprehends the optimal structure and processor ratios for each model. That would be great for the chip biz.

KevinK · Apr 11, 2026

Thought this was a great analysis by a VP at HPE

Reading GTC 2026: What NVIDIA’s Moves Actually Signal

The AI cycle from 2022 to 2024 had one central question: can the models get better fast enough? That made chips the obvious place to focus. More compute meant better models, and better models drove everything else.

www.linkedin.com

Search

SemiAnalysis: Nvidia – The Inference Kingdom Expands

user nl

Well-known member

Very detailed and informative analysis uploaded today by the team at SemiAnalysis:

Nvidia – The Inference Kingdom Expands

Groq LP30, LPX Rack, Attention FFN Disaggregation, Oberon & Kyber Updates, Nvidia's CPO Roadmap, Vera ETL256, CMX & STX

KevinK

Well-known member

KevinK

Well-known member

Reading GTC 2026: What NVIDIA’s Moves Actually Signal

SemiAnalysis: Nvidia – The Inference Kingdom Expands

user nl

Well-known member

Very detailed and informative analysis uploaded today by the team at SemiAnalysis:​

Nvidia – The Inference Kingdom Expands​

Groq LP30, LPX Rack, Attention FFN Disaggregation, Oberon & Kyber Updates, Nvidia's CPO Roadmap, Vera ETL256, CMX & STX​

KevinK

Well-known member

KevinK

Well-known member

Reading GTC 2026: What NVIDIA’s Moves Actually Signal

Very detailed and informative analysis uploaded today by the team at SemiAnalysis:

Nvidia – The Inference Kingdom Expands

Groq LP30, LPX Rack, Attention FFN Disaggregation, Oberon & Kyber Updates, Nvidia's CPO Roadmap, Vera ETL256, CMX & STX