Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/cisco-launched-its-silicon-one-g300-ai-networking-chip-in-a-move-that-aims-to-compete-with-nvidia-and-broadcom.24521/page-4
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030970
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

Cisco launched its Silicon One G300 AI networking chip in a move that aims to compete with Nvidia and Broadcom.

AFAIK Cerebras have not done this for the scaled-up AI case which will drive the entire industry in the next few years (as opposed to particular benchmarks that they chose) and neither has any independent test including anything on SemiAnalysis -- am I wrong?

I think you are correct - they have found a lucrative sub-market market for super-fast response times and token production. For instance, Opus-4.6 from Anthropic on Cerebras, is fast is but ~5x more expensive than the regular version on Cursor. Not sure if and when Cerebras will benchmark over a broader operating range outside of their sweet spot.

I find this guy's blogs to be interesting on the hardware/software challenges of serving coding agents. He explains why different compute paradigms / architectures are needed for different phases of inference for coding agents.


He's one of the guys who originally developed KV caching, while doing a post-Doc at University of Chicago. Now has a startup that is focused on making inference far more cost efficient.
 
Last edited:
Back
Top