Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/cisco-launched-its-silicon-one-g300-ai-networking-chip-in-a-move-that-aims-to-compete-with-nvidia-and-broadcom.24521/page-3
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030970
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

Cisco launched its Silicon One G300 AI networking chip in a move that aims to compete with Nvidia and Broadcom.

Can you point to one of these analyses?

I think you've already decided the answer you will believe, so it looks like asking this question here is not going to be productive. Do you believe GPUs, even with tensor cores and specialized interconnects like NVLink, are an optimal answer for AI training?
At the first order these problems seem to be solved, however, it increases the cost of the system significantly as there are special materials, techniques and tools developed to address these.
The ISO space performance/watt numbers of the CS-3-based systems is better than B200 based systems. However, the ISO space performance/watt/$ performance is much compared to B200 systems and it is evident from the above discussions that the higher cost of solving problems associated with wafer-scale chips are contributing to it.
This is comparing WSA-3 with B200 which is of course no longer the state-of-the-art:

This means models up to ~40 billion parameters (with 16-bit weights) can fit entirely on-chip, enabling them to be run without ever touching off-chip memory. This is a huge advantage: each core gets single-cycle access to weights and activations
When models exceed 44 GB, Cerebras uses a technique called Weight Streaming — the model parameters reside in external MemoryX cabinets (which can be many TB), and the wafer streams in the needed weights each layer.
The catch is that when streaming from off-chip, performance depends on that external memory bandwidth and the sparsity/pattern of weight access.

There's no doubt that the Cerebras solution has fantastic performance so long as the models fit in the WSE memory, but it takes a hit when it doesn't -- and the problem I see here is the almost exponentially increasing size of AI databases, in cases when low-latency memory access is needed. With other solutions it's a lot easier to get a lot more HBM memory close to the NPUs, and this is then considerably faster than the Cerebras external memory for models which fit in HBM. Once they're too big even for this, the playing field levels out again, in fact Cerebras may have an advantage by having fewer levels of memory heirarchy.

But the other issue with Cerebras is cost -- so not performance but performance per dollar. Here the specialized custom hardware (and much lower volumes) put it at a considerable disadvantage.

If neither of these is correct, why has Cerebras not taken over the entire AI world and wiped out the competition?

Your last question -- nope, absolutely not -- they do a good job, but for sure something more exactly tailored to the task could do it better. But then it would also have less flexibility, the classic custom ASIC problem... ;-)
 
Back
Top