Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/csp-providers-lose-interest-in-nvidia-gb-series-supply-chain-reveals-the-collapse-truth%E2%80%94the-more-you-buy-the-longer-you-wait-and-debugging-is-requi.22437/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

CSP Providers Lose Interest in NVIDIA GB Series: Supply Chain Reveals the Collapse Truth—The More You Buy, the Longer You Wait, and Debugging Is Requi

XYang2023

Well-known member
Translation by ChatGPT:
NVIDIA's next-generation GB300 series was unveiled at GTC 2025 and is set to begin trial production in the second quarter. However, the supply chain has revealed that the complexity of assembling GB200 racks far exceeds expectations. Despite improving yields, CSP providers are shifting toward the more mature HGX 8-card series. Industry insiders predict that the earliest GB300 samples will be available for customer testing in Q4 this year, and delays will put significant pressure on Taiwanese component and assembly suppliers, potentially affecting upstream chip manufacturing as well.


Recently, reports emerged that Microsoft is cutting data center computing power. According to supply chain sources, Microsoft was among the first to acquire GB200 NVL72 systems. However, due to poor yields, customers have been forced to participate in testing. The deployment complexity of these systems has been significantly underestimated, requiring around 5–7 working days for installation and frequently encountering instability and system crashes.


For supply chain vendors, the challenge is even greater. Since only NVIDIA engineers are familiar with the overall rack configuration, customers lack control over the installation process, often resulting in a "whack-a-mole" troubleshooting situation—where the more they buy, the longer they wait, and debugging is required together with NVIDIA.


Industry analysts noted that key customers started shifting orders late last year, pinning their hopes on the GB300 series. However, GB200 shipments are expected to reach only 15,000 racks for the entire year. The outlook for GB300 isn't optimistic either; despite plans for trial production in Q2 and mass production in Q3, current supply chain conditions suggest that customer test samples may not be available until the end of the year—meaning mass production likely won't happen in 2025.


The supply chain also highlighted that demand for NVIDIA's consumer RTX GPUs and H20 chips remains strong, mainly due to DeepSeek lowering the barrier for enterprises to adopt AI models. Once enterprises can easily deploy AI at the edge, cloud computing power will become more decentralized. This shift is seen as a value chain transition rather than a decline in AI demand.


Investment analysts believe that AI demand remains strong, with most CSPs shifting to the mature HGX series. Since the HGX B300 uses a single-die design, it can rely on CoWoS-S packaging, allowing for dynamic capacity adjustments.


However, analysts also caution that reports in January indicated a slowdown in advanced packaging. While this initially seemed like a shift from CoWoS-S to CoWoS-L, OSAT vendors have already seen a decline in orders, and further adjustments are expected soon.

 
Reposting what I said regardjng rack-scale systems in another thread.

Basically HGX systems of 8-16 cards are kind of sweet spot, and are much easier to make and deploy.

=======

AI training requires super-large-scale, ultra-fast connected chips; that is why rack-scale GPUs are currently so popular in the market.

However, that is not the case for AI inference, arguably the larger and more important market going forward. In inference, you only need a few chips connected together (e.g., to fit one large model into 4 or 8 chips). What matters most is very large and very fast memory access. That is why the NVIDIA H20 is currently selling so well in China ($10k per chip, significantly lower compute power than the H100, but overall a faster chip for inference).

Intel should probably give up on the AI training market and focus on AI inference, producing chips that are 1) inexpensive, 2) connected on a smaller scale (to save cost), and 3) have large and fast memory access. If Intel can execute these three points well, I believe it would sell so well, that most hyperscalers would abandon their ASIC chips.
 
A few days ago, Pat Gelsinger also expressed views that NVDA expensive systems are overkill for inference tasks.

"Today, if we think about the training workload, okay, but you have to give away something much more optimized for inferencing. You know a GPU is way too expensive; I argue it is 10,000 expensive to fully realize what we want to do with the deployment of inferencing for AI and then, of course, what's beyond that."

 
In inference, you only need a few chips connected together (e.g., to fit one large model into 4 or 8 chips).
True, if you only concern yourself with stand-alone, single use inference “machines”. But not true if you want efficiencies like DeepSeek.
 
A few days ago, Pat Gelsinger also expressed views that NVDA expensive systems are overkill for inference tasks.

"Today, if we think about the training workload, okay, but you have to give away something much more optimized for inferencing. You know a GPU is way too expensive; I argue it is 10,000 expensive to fully realize what we want to do with the deployment of inferencing for AI and then, of course, what's beyond that."

I think Intel should pursue data center GPU besides accelerators and AI foundry. AI algorithms are constantly evolving. Hence GPU is preferred.

It is a combination.
 
I think Intel should pursue data center GPU besides accelerators and AI foundry. AI algorithms are constantly evolving. Hence GPU is preferred.

It is a combination.
Yes, still work on GPUs, but not focusing on expensive rack scale GPU systems packing so many GPUs in one rack. Which proved to have big challenges in making them (heat issues, etc) and deploying them (e.g. data centers need to retrofit their DC to install new racks).

This may become a war like expensive main frames v.s. network of commodity PCs in the history.

If AI is that important and promising, humanity deserves cheaper solutions.

True, if you only concern yourself with stand-alone, single use inference “machines”. But not true if you want efficiencies like DeepSeek.
Deepseek models can comfortably fit in a HGX pod with 8 GPU. You can easily scale out for higher traffic.
 
Last edited:
Deepseek models can comfortably fit in a HGX pod with 8 GPU. You can easily scale out for higher traffic.
You’re missing the point because you’re so focused on the single model, single user at a time (edge) use case. DeepSeek’s big inference efficiency gains come from disaggregation/dualPipe, the shared KV cache system, and real-time load balancing, between models and sub models in their data center.
 
You’re missing the point because you’re so focused on the single model, single user at a time (edge) use case. DeepSeek’s big inference efficiency gains come from disaggregation/dualPipe, the shared KV cache system, and real-time load balancing, between models and sub models in their data center.
You can throw all these terms at me, but share a pointer that dualpipe, shared KV, etc can help achieve much higher efficiency on NVL 72 than on HGX 8.
 
You can throw all these terms at me, but share a pointer that dualpipe, shared KV, etc can help achieve much higher efficiency on NVL 72 than on HGX 8.
The proof is that DeepSeek doesn’t run THEIR service as a bunch of models hosted on HGX 8s, they run distributed models on racks of systems with at least 50k GPUs, and running software that isn’t included in the core models, that many seems to be inclined to view as the sole source of efficiency.
 
The proof is that DeepSeek doesn’t run THEIR service as a bunch of models hosted on HGX 8s, they run distributed models on racks of systems with at least 50k GPUs, and running software that isn’t included in the core models, that many seems to be inclined to view as the sole source of efficiency.
Now seems that you are missing the point. You can use HGX-8 pods to build racks, racks of systems, that is what I meant by "scaling out".

The main advanagge of NVL 72 over HGX-8 is that the intra-communication speed b/w these 72 chips is super high, while HGX-8 only has 8 intra-pod chips being connected that way.

In Deepseek's case, they probably don't even have HGX-8 to build their racks, let alone NVL 72.
 
In Deepseek's case, they probably don't even have HGX-8 to build their racks, let alone NVL 72.
But they have data center level optimization software, a distributed file system and optimized communication between all processors in their data center. The all-to-all is a key part of the efficiency.

Efficient Cross-Node Communication: DeepSeek has developed custom all-to-all communication kernels that fully utilize InfiniBand and NVLink bandwidths. This ensures minimal overhead when transferring data across GPUs in large-scale setups
 
But they have data center level optimization software, a distributed file system and optimized communication between all processors in their data center. The all-to-all is a key part of the efficiency.

Efficient Cross-Node Communication: DeepSeek has developed custom all-to-all communication kernels that fully utilize InfiniBand and NVLink bandwidths. This ensures minimal overhead when transferring data across GPUs in large-scale setups
Well, that proves my main point -- we don't need rack scale systems for inference. We need smart engineering and better softwares, and pods with a few chips connected via ultra high-speed links is more than sufficient.

Let Nvidia deal with these issues of over heating, convincing hyperscalers to help debug and retrofit their DC, etc. Intel and AMD can do what they do best -- build and supply massive amount of reasonably priced chips (GPU or otherwise) ...
 
Intel and AMD can do what they do best -- build and supply massive amount of reasonably priced chips (GPU or otherwise) ...

AMD and Intel are moving to build full interested rack and data-center level hardware and software because that's where reduced enterprise and hyperscaler TCOE comes from.

AMD Closes $4.9 Billion ZT Systems Deal, Targeting Its Piece of the ‘AI Factory’
Chip giant CEO Lisa Su says AI is still in its ‘very early stages’ as the company revs up competition with Nvidia


As AI drives up the complexity of hardware integration for customers, AMD needs to provide more than just chips and software, said Chief Executive Lisa Su. “You really have to put the entire system together,” she said.

Over the past several years, both Nvidia and AMD have taken greater interest in data-center servers, the infrastructure that goes into the massive server farms that power cloud-computing platforms and AI applications. Servers are the overall systems that house and connect chips and accelerators like graphics processing units,

Nvidia has made a concerted effort to broaden its focus from silicon by taking on a new role as data-center designer—or an “AI factory,” a strategy most associated with Chief Executive Jensen Huang. That means offering a sort of one-stop shop for all the key elements in data centers, including software, design services and networking technology.

With ZT, AMD is aiming to get a piece of the AI factory, too. Where AMD stands out among competitors like Nvidia is by offering a so-called open or open-source ecosystem, rather than a proprietary system with all-inclusive pieces, Su said.
 
Translation by ChatGPT:
NVIDIA's next-generation GB300 series was unveiled at GTC 2025 and is set to begin trial production in the second quarter. However, the supply chain has revealed that the complexity of assembling GB200 racks far exceeds expectations. Despite improving yields, CSP providers are shifting toward the more mature HGX 8-card series. Industry insiders predict that the earliest GB300 samples will be available for customer testing in Q4 this year, and delays will put significant pressure on Taiwanese component and assembly suppliers, potentially affecting upstream chip manufacturing as well.


Recently, reports emerged that Microsoft is cutting data center computing power. According to supply chain sources, Microsoft was among the first to acquire GB200 NVL72 systems. However, due to poor yields, customers have been forced to participate in testing. The deployment complexity of these systems has been significantly underestimated, requiring around 5–7 working days for installation and frequently encountering instability and system crashes.


For supply chain vendors, the challenge is even greater. Since only NVIDIA engineers are familiar with the overall rack configuration, customers lack control over the installation process, often resulting in a "whack-a-mole" troubleshooting situation—where the more they buy, the longer they wait, and debugging is required together with NVIDIA.


Industry analysts noted that key customers started shifting orders late last year, pinning their hopes on the GB300 series. However, GB200 shipments are expected to reach only 15,000 racks for the entire year. The outlook for GB300 isn't optimistic either; despite plans for trial production in Q2 and mass production in Q3, current supply chain conditions suggest that customer test samples may not be available until the end of the year—meaning mass production likely won't happen in 2025.


The supply chain also highlighted that demand for NVIDIA's consumer RTX GPUs and H20 chips remains strong, mainly due to DeepSeek lowering the barrier for enterprises to adopt AI models. Once enterprises can easily deploy AI at the edge, cloud computing power will become more decentralized. This shift is seen as a value chain transition rather than a decline in AI demand.


Investment analysts believe that AI demand remains strong, with most CSPs shifting to the mature HGX series. Since the HGX B300 uses a single-die design, it can rely on CoWoS-S packaging, allowing for dynamic capacity adjustments.


However, analysts also caution that reports in January indicated a slowdown in advanced packaging. While this initially seemed like a shift from CoWoS-S to CoWoS-L, OSAT vendors have already seen a decline in orders, and further adjustments are expected soon.


@XYang2023,

The title translation you posted is misleading if you did read the original article written in Chinese.

The original article mentioned the new Nvidia Blackwell server system is experiencing frequent "crashes". The word "collapse" used in your post can be misleading. Some people might think you are talking about the supply chain is collapsing or the market demand is collapsing.
 
Back
Top