Cisco launched its Silicon One G300 AI networking chip in a move that aims to compete with Nvidia and Broadcom.

KevinK · Feb 18, 2026

IanD said:
AFAIK Cerebras have not done this for the scaled-up AI case which will drive the entire industry in the next few years (as opposed to particular benchmarks that they chose) and neither has any independent test including anything on SemiAnalysis -- am I wrong?

I think you are correct - they have found a lucrative sub-market market for super-fast response times and token production. For instance, Opus-4.6 from Anthropic on Cerebras, is fast is but ~5x more expensive than the regular version on Cursor. Not sure if and when Cerebras will benchmark over a broader operating range outside of their sweet spot.

I find this guy's blogs to be interesting on the hardware/software challenges of serving coding agents. He explains why different compute paradigms / architectures are needed for different phases of inference for coding agents.

CES and Groq "Acqui-hire" Reflection: Nvidia's Plan to Build Real Time Agents? | Hanchen Li

Opus-4.6 and GPT-5.3-Codex both use 𝗖𝗲𝗿𝗲𝗯𝗿𝗮𝘀 for fast inference options. Seems like companies are 𝗱𝗶𝘀𝗰𝗮𝗿𝗱𝗶𝗻𝗴 𝗡𝘃𝗶𝗱𝗶𝗮 for real-time agents. But is this the future trend? In my newest blog, I argue that Nvidia’s latest moves still demonstrate great potential for fast but economical agent...

www.linkedin.com

He's one of the guys who originally developed KV caching, while doing a post-Doc at University of Chicago. Now has a startup that is focused on making inference far more cost efficient.

IanD · Feb 19, 2026

We've been talking about inference here -- how about training? Or is this so much smaller as a fraction of the total AI market that it doesn't really matter?

KevinK · Feb 19, 2026

IanD said:
We've been talking about inference here -- how about training? Or is this so much smaller as a fraction of the total AI market that it doesn't really matter?

Most analysis I've seen shows the TAMs between data center inference and training to be about 50/50 right now but tipping toward inference with a 35% CAGR / 20% CAGR differential. I think most new entrants also view training as the harder problem, with more legacy infrastructure, so they seem willing to cede that to NVIDIA and Google, in favor of the faster growing market.

IanD · Feb 20, 2026

KevinK said:
Most analysis I've seen shows the TAMs between data center inference and training to be about 50/50 right now but tipping toward inference with a 35% CAGR / 20% CAGR differential. I think most new entrants also view training as the harder problem, with more legacy infrastructure, so they seem willing to cede that to NVIDIA and Google, in favor of the faster growing market.

Is training is likely to favour a conventional architecture over Cerebras, because of the much bigger local memory size?

Or is this swamped by the sheer size so it's the off-chip/off-board mass storage that limits speed, and this is similar for both?

IanD · Feb 20, 2026

For a completely off-the wall approach to inference that makes Cerebras look ridiculously slow, try this... ;-)

Taalas Etches AI Models Onto Transistors To Rocket Boost Inference

Adding big blocks of SRAM to collections of AI tensor engines, or better still, a waferscale collection of such engines, turbocharges AI inference, as has

www.nextplatform.com

KevinK · Feb 20, 2026

IanD said:
Or is this swamped by the sheer size so it's the off-chip/off-board mass storage that limits speed, and this is similar for both?

My naive take is that Cerebras is well suited for the parallel compute and stream-in of new weights / stream-out of gradients and losses, though there are probably some complications associated with weight broadcasting during pre-training. I think the biggest challenge is that there are relatively few high-usage "sockets" training frontier models in the training space, and most of them have already been fixtured for H200. And if most of the frontier model guys want a second source, they already have one dictated to them - TPUs at Google / Antrhopic, Trainium at Amazon, Huawei Ascend 910B/910C (or other in-house chips) in China. I think they have found some training "sockets" at national labs, and likely in frontier models connected with G42. Maybe some from their partnerships with OpenAI and Perplexity.

Search

Cisco launched its Silicon One G300 AI networking chip in a move that aims to compete with Nvidia and Broadcom.

KevinK

Well-known member

CES and Groq "Acqui-hire" Reflection: Nvidia's Plan to Build Real Time Agents? | Hanchen Li

IanD

Well-known member

KevinK

Well-known member

IanD

Well-known member

IanD

Well-known member

Taalas Etches AI Models Onto Transistors To Rocket Boost Inference

KevinK

Well-known member