You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!
Nvidia's Vera Rubin appears to be focused on improving higher end / "higher value" AI inference workloads. Where Blackwell improved "35X over Hopper" for "Free and Medium tiers", Vera Rubin only improved 2-3X at the lower tiers (still good!), but brings a "35X improvement" at the high end.
A later slide showed the effect of adding Groq-3 chips (heavy on SRAM, optimized more for latency than bandwidth) to Vera Rubin arrays, and that pushed the speed further 'to the right', enabling even higher end tiers (more guarenteed tokens/second for customers).