When it comes to Arm, we think mostly of phones and the “things” in the IoT. We know they’re in a lot of other places too, such as communications infrastructure but that’s a kind of diffuse image – “yeah, they’re everywhere”. We like easy-to-understand targets: phones, IoT devices, we get those. More recently Arm started to talk about servers – those mega-compute monsters, stacked aisle by aisle in datacenters. Here in our view they were taking on an established giant – Intel. And we thought – OK, this should be interesting.
First moves were promising: Qualcomm was going to build servers, Cavium/Marvell, Ampere, Fujitsu and others jumped in. Then Qualcomm jumped back out. Turned out performance didn’t get anywhere near Xeon and (AMD) Epyc performance. We smiled smugly – I told you so. No way an Arm-based system can compete with real servers.
We didn’t feel quite so smug when AWS (Amazon) announced that A1 instances were now available for cloud compute. These are built on Arm-based Graviton processors, developed in AWS. Still, we argued, these are only half the speed of Xeon instances. Half the power also, but who cares about that? So we’re still right, these instances are just for penny-pinching cloud users who can’t afford the premium service.
The challenge for many of us in the semiconductor/systems world is that we see compute in terms of the tasks we know best – giant simulations, finite element analyses, that sort of thing, where needs are all about raw compute performance and I/O bandwidth. But the needs of the real world far outweigh our specialized applications. Most users care about video streaming, gaming, searching and of course AI inferencing (still bigger on the cloud than at the edge per McKinsey).
When it comes to those kinds of application, it turns out that raw performance isn’t the right metric, even for premium users. The correct metric is some function of performance and cost, and isn’t necessarily uniform across different parts of the application. If you’re serving videos at YouTube or Netflix volumes, even as Google or Netflix, you still want to do so profitably. Arm instances can be more cost-effective than Intel/AMD in such functions.
So Arm found a foothold in servers; how do they build on that? They have a roadmap to faster cores, a progression from the Cosmos platform (on which the Graviton processor was based), into the Neoverse platforms, starting with N1, each with a ~30% improvement in performance. AWS just released a forward look into their next generation A2 instances, based on N1, with (according to Arm) a 40% improvement in price performance.
They’re also pushing towards an even more challenging objective: supercomputing. Sandia labs is already underway on their experimental Astra roadmap, using the Marvell/Cavium Thunder X platform – Arm-based. An interesting start, but then where are all the other exascale Arm-based computers? By their nature there aren’t going to be a lot of these around, but still, you might have expected some chatter along these lines.
Now Arm is priming the pump more in this direction through a partnership with NVIDIA. NVIDIA GPUs are already in the current fastest supercomputer in the world, the Summit at Oak Ridge National Labs. They announced earlier in the year that they are teaming with Arm and Arm’s ecosystem partners, to accelerate development of GPU-accelerated Arm-based servers, and to broaden adoption of the CUDA-X libraries and development tools. Cray and HPE are also involved in defining a reference platform.
You can see a picture emerging here, building more footholds in the cloud and in supercomputing, establishing that Arm has a role to play in both domains I’m pretty sure they’re never going to be the central faster-than-anything-on-the-planet part of computation, but clouds and supercomputers have much more diverse needs than can be satisfied by mainstream servers alone. You want the right engines at the right price-point for each task – top-of the-line to compute really fast where that’s what you need, Arm-managed GPUs for massive machine-learning training or inferencing, Arm-powered engines for smart storage and software-defined networking, and application-optimized engines for web-serving and other functions where performance per dollar is a better metric than raw performance.
You can get more info from Arm’s announcement at the SC ’19 conference.