Just to clear some stuff up here...
Groq's LPU v1 was a compiler first design, and is a deterministic processor. The idea is that you know where your data and instructions are on any given clock cycle because it's all traceable at compile time. When they started, they went after the batch 1 market because their SLAs were amazingly good - every batch 1 inference took the same amount of time, regardless. In the days before transformers, when the market was looking like image recognition models / CNN derivatives, it looked amazing.
They then pivoted hard to LLMs. With no DRAM or HBM, they relied on 230 MB of SRAM per chip. This means you need 10 racks or 570-odd chips to run some good 70B models at FP16 (correct me if I'm wrong, I don't think the chip does INT8). Transformers/LLMs don't really need batch 1, but they went ahead anyway. Done some amazing marketing, and when speaking to Jon, they're standing up three customer clouds this month, the biggest being 1.7 million tokens/sec (that's combined across all users). They went hard on the 300 tokens/sec last year on LPU v1 with Llama2-7B and now they're up to 1300 tokens/sec, showing that the software has a way to go.
Don't forget, comparatively speaking, GF14 is cheap here, and why a lot of AI startups have been using it for 1st/2nd gen chips. I suspect if Samsung wants to leverage Groq as an AI/HPC peak customer, there might be some good deal on SF4X to help with co-marketing. They only just sent of the final RTL (GDSII?) to the fab in the past few weeks, Jon said on stage. That means it's being built in Korea - they'll get silicon back in a few months and then customers next year. It may pivot to Taylor over time, but Taylor's not ready (afaik).
I said most of this all on twitter already, but here's some more numbers.
Groq expects to ship/standup :
- 2024 - 0.1 M chips this year
- 2025 - 1.2 M chips next year
That's all LPU v1 numbers - nothing about v2.
Given Jon's history of being part of the TPU crowd, and actually the Groq chip 'is a good chip but it's a shame transformers came', I suspect the v2 will be more aligned with industry requirements. It will be interesting if they've kept the deterministic nature of the processor though.
I'll be posting my write up from the Samsung event on Monday, you'll find it on my substack.