Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/four-trillion-transistors-chip.23523/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Four Trillion Transistors Chip

I think the software stack is the Achilles heel here; unless it can run CUDA it isn't going to take away significant Nvidia market share in the next 3-5 years.
 
Any thoughts on the WSE-3 chip from Cerebras that has four trillion transistors with 900,000 optimized cores? Is this a game changer or curiousity?
It's a game changer for AI. Nothing touches it, performance-wise, especially for inference. There are numerous challenges for a wafer-scale processor, especially in applications. They need to keep the cores small, so defects don't cause lots of expensive die area to be lost to relatively large cores with just small defects. But AI-specific cores are naturally small. The packaging needs to be completely custom, and the cooling system is a work of art. No standard high volume rack chassis for Cerebras. Clustering the modules is a networking challenge too, but Cerebras uses it's own internal networking architecture, and uses 100Gb Ethernet for external communications with a custom gateway for translation to their internal message passing architecture.

The advantages in chip resident SRAM (44GB) and 21PB/sec of on chip memory bandwidth are game changers in and of themselves.

Cerebras can currently cluster 2048 CSE-3 nodes into one system, which means you're looking at 1.8B+ cores, but I don't think they've built any large clusters yet.

The CSE-3 can be used for training, but they seem to be focusing on inference. They're still a private company, but their investor list looks like a who's who of industry superstars. These guys are the Cray Research of AI systems for inferencing.

I think they're amazing.
 
Last edited:
It's a game changer for AI. Nothing touches it, performance-wise, especially for inference.
Does outright performance really matter here though?

AI is already a highly risky investment for any major firm, and if you're spending billions on something already highly risky -- would you further add to the risk by trying Cerebras's products instead of just buying from Nvidia?

I agree the tech is pretty fantastic, but I think they're just waiting for a buy out from an AMD, Nvidia, or similar at this point?
 
Does outright performance really matter here though?
Of course, it depends on the application, but performance matters in the usual metrics: latency (time to first token), and throughput (tokens per second). With thousands or tens of thousands of concurrent users, inference performance can be critical to customer application success.
AI is already a highly risky investment for any major firm, and if you're spending billions on something already highly risky -- would you further add to the risk by trying Cerebras's products instead of just buying from Nvidia?
I think Cerebras products are largely still in the investigation, research, and proof of concept phases for user applications. I agree, Nvidia clusters are the low-risk approach. I think, but don't have data, that cloud company AI chips and systems are mostly targeted at internal applications.
I agree the tech is pretty fantastic, but I think they're just waiting for a buy out from an AMD, Nvidia, or similar at this point?
I doubt it. Cerebras was unicorn a few years ago, and have had a plan to go public. As with so many unicorns, I think they're just waiting for the exact right time to saddle themselves with the complexity of being a public company to get the huge payoff of being public.
 
I think they're amazing.
I agree - they have found a way to build wafer scale chips that have killed many companies before them and harness the WSE-3 / CSE-3 on one of the most relevant problems of the times, LLM inference. Their on-chip static memory and associated bandwidth gives them pretty much unrivaled performance on LLM benchmarks, plus the short on-chip interconnect gives them a power advantage on a per compute basis. Where I think they fall down a bit is on their capital cost per million tokens in a many-user environment.
 
found a way to build wafer scale chips that have killed many companies before them

Amdahl, Amdahl, Amdahl was indeed the absolute star, and hype champion of the era, and we know how he ended. You could indeed have a benchmark beating product, the most spoken about product, the most commercially profitable one, with the most exclusive features, and still fly out of the window, if you fail at mass market.
 
Where I think they fall down a bit is on their capital cost per million tokens in a many-user environment.
I don't know how to answer this. Cerebras offers their own cloud services for their systems, and the pricing is publicly advertised. I'm not ambitious enough to do the rigorous analysis, but it is claimed to be competitive with competing cloud services. If you want to actually buy a Cerebras system, their business terms are confidential, and pricing isn't published.
 
I agree - they have found a way to build wafer scale chips that have killed many companies before them and harness the WSE-3 / CSE-3 on one of the most relevant problems of the times, LLM inference. Their on-chip static memory and associated bandwidth gives them pretty much unrivaled performance on LLM benchmarks, plus the short on-chip interconnect gives them a power advantage on a per compute basis. Where I think they fall down a bit is on their capital cost per million tokens in a many-user environment.
In theory their tech should give them the best perf/watt -- which should let them scale down cost per token in a large environment further than Nvidia's offerings.
 
Iirc their largest stakeholder (70+%?) is in the Middle East.

I like their approach, I think it's a clever way to work with what we have today. From what I glean of their documents, they take the whole wafer without dicing, and route redistribution (?) layers over the reticle boundaries. Please correct me if I'm wrong. If true, this technique would make for fascinating reading all by itself.

A question: is all the memory in the wafer itself, or do they add more atop the wafer somewhat like AMD's 3Dcache products?
 
Iirc their largest stakeholder (70+%?) is in the Middle East.

I like their approach, I think it's a clever way to work with what we have today. From what I glean of their documents, they take the whole wafer without dicing, and route redistribution (?) layers over the reticle boundaries. Please correct me if I'm wrong. If true, this technique would make for fascinating reading all by itself.

A question: is all the memory in the wafer itself, or do they add more atop the wafer somewhat like AMD's 3Dcache products?
They have a facility called MemoryX units, which are connected via their proprietary inter-WSE interconnect called SwarmX. This article describes the off-chip distributed DRAM memory at a high level:

 
Iirc their largest stakeholder (70+%?) is in the Middle East.

I like their approach, I think it's a clever way to work with what we have today. From what I glean of their documents, they take the whole wafer without dicing, and route redistribution (?) layers over the reticle boundaries. Please correct me if I'm wrong. If true, this technique would make for fascinating reading all by itself.

A question: is all the memory in the wafer itself, or do they add more atop the wafer somewhat like AMD's 3Dcache products?

yes, they have to stop the stepper to change masks for each die on a wafer, and restart the materials system each time
 
Back
Top