The question is, to me, is any start-up chip designer successful in the AI market who is not partnered with a major player (cloud computing company or top tier chip maker)? I can't think of any. While a couple of my friends point to Cerebras, they are a remarkable technical success, but revenue-wise they are still well under $500M, and losing money.
I would put Cerebras in the path to success category, but they still might need the major partnering tie-in, like Amazon, to go the distance. Other than that, I think you are right, with two caveats:
* There's likely still opportunity on the client side. It's not clear that Apple, Intel and AMD have gotten their XPUs right for real-world client side applications, and there are more specialized apps like autonomous driving, etc. where chips optimized for something completely different (al la AI5, AI6) are required.
* China - who knows what's going to evolve as the home-grown data center inference win there. Probably still ties in with Huawei or a hyperscaler/CSPs like Alibaba. I do find it interesting that Alibaba has been doing some major contributions with respect to Mooncake to NVIDIA's open source "OS" for racks/pods, Dynamo.
This article shows how SGLang RBG + Mooncake enable production-grade, cloud-native LLM inference with PD-disaggregation.
www.alibabacloud.com
Reasoning models are growing rapidly in size and are increasingly being integrated into agentic AI workflows that interact with other models and external tools. Deploying these models and workflows in…
developer.nvidia.com
But inference is where Groq has been aiming all along, correct? Except for Cerebras, are any start-ups competing in training anymore? (I can't think of any.)
I can't, though again, China is the unknown. There isn't much space given TPU, Trainium 3 and AMD are already filling up the rest of the training space.
Building an inference solution on Groq, or any start-up, is a risky proposition that they'll fold from lack of funding, and you'll have wasted precious time to market. Now that Groq has the backing of Nvidia, I think they could become the frontrunner. Not because they're so awesome, but because of Nvidia. Then there's SambaNova and Intel. I think there's hope for Samba-tel too.
My guess is that there will be fallout and specialization in data-center level rack/pod level AI inference hardware over the next couple years. There's just so much that has to come together for the co-optimized general solution - processor systems (2 or more different types), connectivity, storage tiers, orchestration OS, all the different model serving environments, plus the physical racks with extreme power and cooling considerations. And all the hardware components require new chips that are not general purpose and have to be co-optimized together - so we're back to a vertical focus like IBM mainframes, at least for while.