Arm Releases First Ever AI Chip, With Meta As Initial Customer

soAsian · Mar 24, 2026

For more than 35 years, Arm has licensed its chip architecture and collected royalties on every processor made by customers like Apple, Nvidia, Google, Amazon, Intel and Microsoft. Now Arm is becoming their competitor, making physical silicon of its own for the first time, with Meta as the initial customer. CNBC got an exclusive first look at Arm's new AGI CPU that's "ruthlessly optimized" for running AI inference in data centers, and toured the $71 million lab in Austin, Texas, that Arm built for its new CPU.

Golden age!?

blueone · Mar 24, 2026

I don't get it. These chips are just a bunch of Neoverse V3 cores. Nothing really special. I'm having trouble seeing advantages it has beyond the AWS Graviton 5. And then there's Ampere, also owned by SoftBank, which has custom Arm cores, which are supposedly superior to Neoverse V3 cores. Arm and SoftBank confuse me. Or am I missing something?

rinoali · Mar 24, 2026

I am watching the livecast of Arm AGI CPU lunch，I think they want to join the competition of AI Server Architechture system，so they need CPU to provide total solutions.

Xebec · Mar 25, 2026

blueone said:
I don't get it. These chips are just a bunch of Neoverse V3 cores. Nothing really special. I'm having trouble seeing advantages it has beyond the AWS Graviton 5. And then there's Ampere, also owned by SoftBank, which has custom Arm cores, which are supposedly superior to Neoverse V3 cores. Arm and SoftBank confuse me. Or am I missing something?

It's hard to get really good info on this ARM "AGI CPU", but it *looks* like the differences are:

- Graviton is more focused on general purpose, while ARM AGI is more biased towards Inference via improved inter-chip and off-chip bandwidth
- Graviton appears to be used exclusively by Amazon for AWS, where ARM AGI is the "public market offering" that anyone can use

It doesn't seem very special but it is at least a "general offering" that any server/cloud provider could buy/use.

.. and yes Ampere definitely has strong overlap with both Graviton and ARM AGI ..

blueone · Mar 25, 2026

Xebec said:
It's hard to get really good info on this ARM "AGI CPU", but it *looks* like the differences are:

- Graviton is more focused on general purpose, while ARM AGI is more biased towards Inference via improved inter-chip and off-chip bandwidth
- Graviton appears to be used exclusively by Amazon for AWS, where ARM AGI is the "public market offering" that anyone can use

It doesn't seem very special but it is at least a "general offering" that any server/cloud provider could buy/use.

.. and yes Ampere definitely has strong overlap with both Graviton and ARM AGI ..

I think the marketing gimmick of tagging it as the "AGI CPU" is just silly. I can call myself young and good-looking, but as the Linda Ronstadt song goes, all it would take is "Just One Look" to know I'm full of you-know-what. Even Intel Xeon 6s have hardware accelerators which make them better for AI than these "AGI" CPUs. I have trouble typing the AGI letters without chuckling. Arm, seriously, fire your marketing team.

Xebec · Mar 25, 2026

blueone said:
I think the marketing gimmick of tagging it as the "AGI CPU" is just silly. I can call myself young and good-looking, but as the Linda Ronstadt song goes, all it would take is "Just One Look" to know I'm full of you-know-what. Even Intel Xeon 6s have hardware accelerators which make it them better for AI than these "AGI" CPUs. I have trouble typing the AGI letters without chuckling. Arm, seriously, fire your marketing team.

100% agree on marketing getting stupid. Though "AI" itself is already an incorrect term for LLMs..

swka · Mar 25, 2026

Couple questions for the forum

1. What do you think the competitive advantage between this ARM "AGI CPU" versus what nVidia is building internally that is also based on ARM architecture? Obviously NV's orchestration CPU is in production, but do you see ARM getting into NV's eco-system with the silicon in addition to IP play?

2. As for x86 comparison, we know Intel has been NV's partner lately, and I think from their latest earnings call, Intel is building and supplying their purposely built CPU into NV as well. So we can assume Intel at least understand very well, or as well as ARM, about NV's AI infrastructure needs. Assuming current Intel offering is sub-par compared to AGI CPU, what would it take for Intel (or AMD) to build one that's competitive or better that AGI CPU?

blueone · Mar 25, 2026

swka said:
Couple questions for the forum

1. What do you think the competitive advantage between this ARM "AGI CPU" versus what nVidia is building internally that is also based on ARM architecture? Obviously NV's orchestration CPU is in production, but do you see ARM getting into NV's eco-system with the silicon in addition to IP play?

AGI versus Vera... to start, AGI uses a larger number of single-threaded cores. Vera uses a "spatial threading" strategy with two threads per core, but the threads have dedicated execution blocks, which is different than Intel's Simultaneous Multi-Threading (SMT). SMT designs use over-provisioned execution blocks which mostly allow the two threads to proceed without stalling. Mostly. All three strategies use shared caches to reduce cache coherency complexity. Without comparison traces of the same code on all three CPUs it is impossible to directly compare their performance.

It looks like getting into Nvidia's ecosystem is a matter of integrating NVLink port blocks onto the CPU dies and giving them cache-level access. Without that integration, scale-up link transfers will be essentially I/O operations, which are much higher overhead functions than direct cache access. Being in the coherent cache domain is said to be part of Vera's development. So it looks it to me like AGI is more likely to be in the "not Nvidia" market.

swka said:
2. As for x86 comparison, we know Intel has been NV's partner lately, and I think from their latest earnings call, Intel is building and supplying their purposely built CPU into NV as well. So we can assume Intel at least understand very well, or as well as ARM, about NV's AI infrastructure needs. Assuming current Intel offering is sub-par compared to AGI CPU, what would it take for Intel (or AMD) to build one that's competitive or better that AGI CPU?

I would not assume Intel's Xeon approach is sub-par to Arm AGI, until we get data from direct comparison testing. That's probably going to take a while to be published. If Xeons integrate NVLink and AGIs don't, that right there is going to give Intel a substantial advantage. If anything is an advantage for AGI, it might be lower power consumption.

swka · Mar 25, 2026

blueone said:
AGI versus Vera... to start, AGI uses a larger number of single-threaded cores. Vera uses a "spatial threading" strategy with two threads per core, but the threads have dedicated execution blocks, which is different than Intel's Simultaneous Multi-Threading (SMT). SMT designs use over-provisioned execution blocks which mostly allow the two threads to proceed without stalling. Mostly. All three strategies use shared caches to reduce cache coherency complexity. Without comparison traces of the same code on all three CPUs it is impossible to directly compare their performance.

It looks like getting into Nvidia's ecosystem is a matter of integrating NVLink port blocks onto the CPU dies and giving them cache-level access. Without that integration, scale-up link transfers will be essentially I/O operations, which are much higher overhead functions than direct cache access. Being in the coherent cache domain is said to be part of Vera's development. So it looks it to me like AGI is more likely to be in the "not Nvidia" market.

I would not assume Intel's Xeon approach is sub-par to Arm AGI, until we get data from direct comparison testing. That's probably going to take a while to be published. If Xeons integrate NVLink and AGIs don't, that right there is going to give Intel a substantial advantage. If anything is an advantage for AGI, it might be lower power consumption.

thanks for the insight about NVlink integration. more likely than not, Intel is already working with NV on this

Y.H · Mar 26, 2026

Hi. Since spatial multithreading is mentioned, could you talk a little about the trade offs between spatial multithreading vs SMT. Seems like you need a lot more resources for spatial multithreading…

blueone said:
AGI versus Vera... to start, AGI uses a larger number of single-threaded cores. Vera uses a "spatial threading" strategy with two threads per core, but the threads have dedicated execution blocks, which is different than Intel's Simultaneous Multi-Threading (SMT). SMT designs use over-provisioned execution blocks which mostly allow the two threads to proceed without stalling. Mostly. All three strategies use shared caches to reduce cache coherency complexity. Without comparison traces of the same code on all three CPUs it is impossible to directly compare their performance.

It looks like getting into Nvidia's ecosystem is a matter of integrating NVLink port blocks onto the CPU dies and giving them cache-level access. Without that integration, scale-up link transfers will be essentially I/O operations, which are much higher overhead functions than direct cache access. Being in the coherent cache domain is said to be part of Vera's development. So it looks it to me like AGI is more likely to be in the "not Nvidia" market.

I would not assume Intel's Xeon approach is sub-par to Arm AGI, until we get data from direct comparison testing. That's probably going to take a while to be published. If Xeons integrate NVLink and AGIs don't, that right there is going to give Intel a substantial advantage. If anything is an advantage for AGI, it might be lower power consumption.

Xebec · Mar 26, 2026

Minor add - "AGI CPUs" need a lot of bandwidth with reasonable latency. Intel typically has a really good memory controller design that could give a (small?) competitive advantage to their x86 solution.

blueone · Mar 26, 2026

swka said:
thanks for the insight about NVlink integration. more likely than not, Intel is already working with NV on this

NVLink integration into Xeon was announced a short time ago:

Intel and NVIDIA to Jointly Develop AI Infrastructure and Personal Computing Products

newsroom.intel.com

The question in my mind is, how good will the integration be?

blueone · Mar 26, 2026

Y.H said:
Hi. Since spatial multithreading is mentioned, could you talk a little about the trade offs between spatial multithreading vs SMT. Seems like you need a lot more resources for spatial multithreading…

I think spatial multi-threading might be a die area savings compared to a larger number of single-threaded full cores, but without the asserted stalling issues with SMT. Nvidia's premise seems to be that AI workloads are more intolerant of SMT issues than non-AI datacenter workloads. Just thinking about it for a minute or two, Nvidia seems to have a reasonable point.

siliconbruh999 · Mar 26, 2026

blueone said:
I would not assume Intel's Xeon approach is sub-par to Arm AGI, until we get data from direct comparison testing. That's probably going to take a while to be published. If Xeons integrate NVLink and AGIs don't, that right there is going to give Intel a substantial advantage. If anything is an advantage for AGI, it might be lower power consumption.

I have the answer for this apparently the Reason NVIDIA Invested into Intel was for this exact reason they wanted a X86 CPU that can do NVLink and there are two choices and since AMD is a competitor so they went with Intel. Intel will take their CPU Dies and just slap on a NVLink Die and sell it to Nvidia which Nvidia will put into their Racks.

blueone · Mar 26, 2026

siliconbruh999 said:
I have the answer for this apparently the Reason NVIDIA Invested into Intel was for this exact reason they wanted a X86 CPU that can do NVLink and there are two choices and since AMD is a competitor so they went with Intel. Intel will take their CPU Dies and just slap on a NVLink Die and sell it to Nvidia which Nvidia will put into their Racks.

Yeah, it's the "just slap on" part that makes me wonder.

Adding a low-latency and high bandwidth network interface to a CPU and allowing it to realize its performance potential is tricky stuff.

siliconbruh999 · Mar 26, 2026

blueone said:
Yeah, it's the "just slap on" part that makes me wonder. Adding a low-latency and high bandwidth network interface to a CPU and allowing it to realize its performance potential is tricky stuff.

It's just a different IO Die For NVLink vs Standard Diamond Rapids thats about it but i agree with you it's quite tricky

KevinK · Mar 26, 2026

siliconbruh999 said:
Intel will take their CPU Dies and just slap on a NVLink Die and sell it to Nvidia which Nvidia will put into their Racks.

That enables better rack/data-center level agentic access to legacy SQL and application systems that are stuck on x86. But it also seems like NVIDA is working with CSP/app providers to accelerate traditional data processing, based on examples they gave at GTC 2026

• Google Cloud + Snap
• Snap runs GPU‑accelerated Apache Spark A/B‑testing pipelines on Google Cloud (GKE + L4 GPUs) using NVIDIA RAPIDS/cuDF.
• Processes 10+ PB of data per day, analyzing 6,000+ metrics for 940M+ users.
• Achieves about 4× faster runtimes and roughly 76% daily cost savings versus prior CPU‑only Spark clusters.

• IBM + NVIDIA (Nestlé data mart / watsonx.data)
• IBM integrates NVIDIA cuDF and GPU acceleration into watsonx.data (Presto/Velox SQL) for large enterprise data marts.
• Used on Nestlé’s global “Order‑to‑Cash” data mart (multi‑TB, 44 tables, 186 countries) to accelerate SQL analytics on GPUs instead of CPUs.
• Delivers materially faster query performance and lower TCO by offloading analytics to NVIDIA GPUs while IBM Storage Scale feeds 10 PB+ of data.

• Oracle + NVIDIA (OCI + Oracle Database AI)
• Oracle Cloud Infrastructure offers GPU‑accelerated Spark via the NVIDIA RAPIDS Accelerator, so existing Spark ETL/analytics jobs can run on GPUs without code changes.
• Oracle Database 23ai/26ai uses NVIDIA cuVS and related GPU libraries to accelerate vector search and index generation directly inside the database.
• Joint positioning: faster data preparation and AI workloads with lower cost by shifting heavy data processing from CPU‑only Oracle and Spark environments to NVIDIA GPU‑accelerated infrastructure on OCI.

Y.H · Mar 28, 2026

blueone said:
I think spatial multi-threading might be a die area savings compared to a larger number of single-threaded full cores, but without the asserted stalling issues with SMT. Nvidia's premise seems to be that AI workloads are more intolerant of SMT issues than non-AI datacenter workloads. Just thinking about it for a minute or two, Nvidia seems to have a reasonable p

blueone said:
I think spatial multi-threading might be a die area savings compared to a larger number of single-threaded full cores, but without the asserted stalling issues with SMT. Nvidia's premise seems to be that AI workloads are more intolerant of SMT issues than non-AI datacenter workloads. Just thinking about it for a minute or two, Nvidia seems to have a reasonable point.

Pardon for dwelling on this a bit more. I was thinking having dedicated resources for each thread would be very resource intensive and perhaps might have more demands on bandwidth. Does the workload really could fully utilized all the dedicated resources? Or could it be , I am guessing , that spatial multithreading is less demanding in scheduler ….

blueone · Mar 29, 2026

Y.H said:
Pardon for dwelling on this a bit more. I was thinking having dedicated resources for each thread would be very resource intensive and perhaps might have more demands on bandwidth. Does the workload really could fully utilized all the dedicated resources? Or could it be , I am guessing , that spatial multithreading is less demanding in scheduler ….

Answering that question requires detailed knowledge that I don't have. Nvidia must have determined there is a significant advantage for spatial threads over two separate cores. They do not seem like the sort of company would create a feature like that just for a marketing advantage. And marketing to whom? Nerds like us? That seems unlikely.

Edit: It just occurred to me you are wondering about spatial threads versus Intel's SMT with shared execution units. Are the dedicated blocks worth it? Only Nvidia knows with their logic designs, their workloads, and (I assume) their simulations. My thought continues to be that Nvidia wouldn't go through the effort and additional cost for hardware threads, especially with dedicated execution blocks per thread, unless they knew that cost would make their CPUs more performant and efficient.

Arm Releases First Ever AI Chip, With Meta As Initial Customer

Well-known member

Well-known member

New member

Well-known member

Well-known member

Well-known member

Active member

Well-known member

Active member

Active member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Active member

Well-known member