Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/cisco-launched-its-silicon-one-g300-ai-networking-chip-in-a-move-that-aims-to-compete-with-nvidia-and-broadcom.24521/page-4
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030970
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

Cisco launched its Silicon One G300 AI networking chip in a move that aims to compete with Nvidia and Broadcom.

AFAIK Cerebras have not done this for the scaled-up AI case which will drive the entire industry in the next few years (as opposed to particular benchmarks that they chose) and neither has any independent test including anything on SemiAnalysis -- am I wrong?

I think you are correct - they have found a lucrative sub-market market for super-fast response times and token production. For instance, Opus-4.6 from Anthropic on Cerebras, is fast is but ~5x more expensive than the regular version on Cursor. Not sure if and when Cerebras will benchmark over a broader operating range outside of their sweet spot.

I find this guy's blogs to be interesting on the hardware/software challenges of serving coding agents. He explains why different compute paradigms / architectures are needed for different phases of inference for coding agents.


He's one of the guys who originally developed KV caching, while doing a post-Doc at University of Chicago. Now has a startup that is focused on making inference far more cost efficient.
 
Last edited:
We've been talking about inference here -- how about training? Or is this so much smaller as a fraction of the total AI market that it doesn't really matter?
 
We've been talking about inference here -- how about training? Or is this so much smaller as a fraction of the total AI market that it doesn't really matter?

Most analysis I've seen shows the TAMs between data center inference and training to be about 50/50 right now but tipping toward inference with a 35% CAGR / 20% CAGR differential. I think most new entrants also view training as the harder problem, with more legacy infrastructure, so they seem willing to cede that to NVIDIA and Google, in favor of the faster growing market.
 
Most analysis I've seen shows the TAMs between data center inference and training to be about 50/50 right now but tipping toward inference with a 35% CAGR / 20% CAGR differential. I think most new entrants also view training as the harder problem, with more legacy infrastructure, so they seem willing to cede that to NVIDIA and Google, in favor of the faster growing market.
Is training is likely to favour a conventional architecture over Cerebras, because of the much bigger local memory size?

Or is this swamped by the sheer size so it's the off-chip/off-board mass storage that limits speed, and this is similar for both?
 
For a completely off-the wall approach to inference that makes Cerebras look ridiculously slow, try this... ;-)


Taalas.png
 
Or is this swamped by the sheer size so it's the off-chip/off-board mass storage that limits speed, and this is similar for both?

My naive take is that Cerebras is well suited for the parallel compute and stream-in of new weights / stream-out of gradients and losses, though there are probably some complications associated with weight broadcasting during pre-training. I think the biggest challenge is that there are relatively few high-usage "sockets" training frontier models in the training space, and most of them have already been fixtured for H200. And if most of the frontier model guys want a second source, they already have one dictated to them - TPUs at Google / Antrhopic, Trainium at Amazon, Huawei Ascend 910B/910C (or other in-house chips) in China. I think they have found some training "sockets" at national labs, and likely in frontier models connected with G42. Maybe some from their partnerships with OpenAI and Perplexity.
 
Back
Top