Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/arm-releases-first-ever-ai-chip-with-meta-as-initial-customer.24822/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030970
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

Arm Releases First Ever AI Chip, With Meta As Initial Customer

soAsian

Well-known member
For more than 35 years, Arm has licensed its chip architecture and collected royalties on every processor made by customers like Apple, Nvidia, Google, Amazon, Intel and Microsoft. Now Arm is becoming their competitor, making physical silicon of its own for the first time, with Meta as the initial customer. CNBC got an exclusive first look at Arm's new AGI CPU that's "ruthlessly optimized" for running AI inference in data centers, and toured the $71 million lab in Austin, Texas, that Arm built for its new CPU.


Golden age!?
 
I don't get it. These chips are just a bunch of Neoverse V3 cores. Nothing really special. I'm having trouble seeing advantages it has beyond the AWS Graviton 5. And then there's Ampere, also owned by SoftBank, which has custom Arm cores, which are supposedly superior to Neoverse V3 cores. Arm and SoftBank confuse me. Or am I missing something?
 
I am watching the livecast of Arm AGI CPU lunch,I think they want to join the competition of AI Server Architechture system,so they need CPU to provide total solutions.
 
I don't get it. These chips are just a bunch of Neoverse V3 cores. Nothing really special. I'm having trouble seeing advantages it has beyond the AWS Graviton 5. And then there's Ampere, also owned by SoftBank, which has custom Arm cores, which are supposedly superior to Neoverse V3 cores. Arm and SoftBank confuse me. Or am I missing something?
It's hard to get really good info on this ARM "AGI CPU", but it *looks* like the differences are:

- Graviton is more focused on general purpose, while ARM AGI is more biased towards Inference via improved inter-chip and off-chip bandwidth
- Graviton appears to be used exclusively by Amazon for AWS, where ARM AGI is the "public market offering" that anyone can use

It doesn't seem very special but it is at least a "general offering" that any server/cloud provider could buy/use.

.. and yes Ampere definitely has strong overlap with both Graviton and ARM AGI ..
 
It's hard to get really good info on this ARM "AGI CPU", but it *looks* like the differences are:

- Graviton is more focused on general purpose, while ARM AGI is more biased towards Inference via improved inter-chip and off-chip bandwidth
- Graviton appears to be used exclusively by Amazon for AWS, where ARM AGI is the "public market offering" that anyone can use

It doesn't seem very special but it is at least a "general offering" that any server/cloud provider could buy/use.

.. and yes Ampere definitely has strong overlap with both Graviton and ARM AGI ..
I think the marketing gimmick of tagging it as the "AGI CPU" is just silly. I can call myself young and good-looking, but as the Linda Ronstadt song goes, all it would take is "Just One Look" to know I'm full of you-know-what. Even Intel Xeon 6s have hardware accelerators which make them better for AI than these "AGI" CPUs. I have trouble typing the AGI letters without chuckling. Arm, seriously, fire your marketing team.
 
Last edited:
I think the marketing gimmick of tagging it as the "AGI CPU" is just silly. I can call myself young and good-looking, but as the Linda Ronstadt song goes, all it would take is "Just One Look" to know I'm full of you-know-what. Even Intel Xeon 6s have hardware accelerators which make it them better for AI than these "AGI" CPUs. I have trouble typing the AGI letters without chuckling. Arm, seriously, fire your marketing team.
100% agree on marketing getting stupid. Though "AI" itself is already an incorrect term for LLMs..
 
Couple questions for the forum

1. What do you think the competitive advantage between this ARM "AGI CPU" versus what nVidia is building internally that is also based on ARM architecture? Obviously NV's orchestration CPU is in production, but do you see ARM getting into NV's eco-system with the silicon in addition to IP play?

2. As for x86 comparison, we know Intel has been NV's partner lately, and I think from their latest earnings call, Intel is building and supplying their purposely built CPU into NV as well. So we can assume Intel at least understand very well, or as well as ARM, about NV's AI infrastructure needs. Assuming current Intel offering is sub-par compared to AGI CPU, what would it take for Intel (or AMD) to build one that's competitive or better that AGI CPU?
 
Couple questions for the forum

1. What do you think the competitive advantage between this ARM "AGI CPU" versus what nVidia is building internally that is also based on ARM architecture? Obviously NV's orchestration CPU is in production, but do you see ARM getting into NV's eco-system with the silicon in addition to IP play?
AGI versus Vera... to start, AGI uses a larger number of single-threaded cores. Vera uses a "spatial threading" strategy with two threads per core, but the threads have dedicated execution blocks, which is different than Intel's Simultaneous Multi-Threading (SMT). SMT designs use over-provisioned execution blocks which mostly allow the two threads to proceed without stalling. Mostly. All three strategies use shared caches to reduce cache coherency complexity. Without comparison traces of the same code on all three CPUs it is impossible to directly compare their performance.

It looks like getting into Nvidia's ecosystem is a matter of integrating NVLink port blocks onto the CPU dies and giving them cache-level access. Without that integration, scale-up link transfers will be essentially I/O operations, which are much higher overhead functions than direct cache access. Being in the coherent cache domain is said to be part of Vera's development. So it looks it to me like AGI is more likely to be in the "not Nvidia" market.
2. As for x86 comparison, we know Intel has been NV's partner lately, and I think from their latest earnings call, Intel is building and supplying their purposely built CPU into NV as well. So we can assume Intel at least understand very well, or as well as ARM, about NV's AI infrastructure needs. Assuming current Intel offering is sub-par compared to AGI CPU, what would it take for Intel (or AMD) to build one that's competitive or better that AGI CPU?
I would not assume Intel's Xeon approach is sub-par to Arm AGI, until we get data from direct comparison testing. That's probably going to take a while to be published. If Xeons integrate NVLink and AGIs don't, that right there is going to give Intel a substantial advantage. If anything is an advantage for AGI, it might be lower power consumption.
 
AGI versus Vera... to start, AGI uses a larger number of single-threaded cores. Vera uses a "spatial threading" strategy with two threads per core, but the threads have dedicated execution blocks, which is different than Intel's Simultaneous Multi-Threading (SMT). SMT designs use over-provisioned execution blocks which mostly allow the two threads to proceed without stalling. Mostly. All three strategies use shared caches to reduce cache coherency complexity. Without comparison traces of the same code on all three CPUs it is impossible to directly compare their performance.

It looks like getting into Nvidia's ecosystem is a matter of integrating NVLink port blocks onto the CPU dies and giving them cache-level access. Without that integration, scale-up link transfers will be essentially I/O operations, which are much higher overhead functions than direct cache access. Being in the coherent cache domain is said to be part of Vera's development. So it looks it to me like AGI is more likely to be in the "not Nvidia" market.

I would not assume Intel's Xeon approach is sub-par to Arm AGI, until we get data from direct comparison testing. That's probably going to take a while to be published. If Xeons integrate NVLink and AGIs don't, that right there is going to give Intel a substantial advantage. If anything is an advantage for AGI, it might be lower power consumption.
thanks for the insight about NVlink integration. more likely than not, Intel is already working with NV on this
 
Hi. Since spatial multithreading is mentioned, could you talk a little about the trade offs between spatial multithreading vs SMT. Seems like you need a lot more resources for spatial multithreading…
AGI versus Vera... to start, AGI uses a larger number of single-threaded cores. Vera uses a "spatial threading" strategy with two threads per core, but the threads have dedicated execution blocks, which is different than Intel's Simultaneous Multi-Threading (SMT). SMT designs use over-provisioned execution blocks which mostly allow the two threads to proceed without stalling. Mostly. All three strategies use shared caches to reduce cache coherency complexity. Without comparison traces of the same code on all three CPUs it is impossible to directly compare their performance.

It looks like getting into Nvidia's ecosystem is a matter of integrating NVLink port blocks onto the CPU dies and giving them cache-level access. Without that integration, scale-up link transfers will be essentially I/O operations, which are much higher overhead functions than direct cache access. Being in the coherent cache domain is said to be part of Vera's development. So it looks it to me like AGI is more likely to be in the "not Nvidia" market.

I would not assume Intel's Xeon approach is sub-par to Arm AGI, until we get data from direct comparison testing. That's probably going to take a while to be published. If Xeons integrate NVLink and AGIs don't, that right there is going to give Intel a substantial advantage. If anything is an advantage for AGI, it might be lower power consumption.
 
Minor add - "AGI CPUs" need a lot of bandwidth with reasonable latency. Intel typically has a really good memory controller design that could give a (small?) competitive advantage to their x86 solution.
 
Hi. Since spatial multithreading is mentioned, could you talk a little about the trade offs between spatial multithreading vs SMT. Seems like you need a lot more resources for spatial multithreading…
I think spatial multi-threading might be a die area savings compared to a larger number of single-threaded full cores, but without the asserted stalling issues with SMT. Nvidia's premise seems to be that AI workloads are more intolerant of SMT issues that non-AI datacenter workloads. Just thinking about it for a minute or two, Nvidia seems to have a reasonable point.
 
I would not assume Intel's Xeon approach is sub-par to Arm AGI, until we get data from direct comparison testing. That's probably going to take a while to be published. If Xeons integrate NVLink and AGIs don't, that right there is going to give Intel a substantial advantage. If anything is an advantage for AGI, it might be lower power consumption.
I have the answer for this apparently the Reason NVIDIA Invested into Intel was for this exact reason they wanted a X86 CPU that can do NVLink and there are two choices and since AMD is a competitor so they went with Intel. Intel will take their CPU Dies and just slap on a NVLink Die and sell it to Nvidia which Nvidia will put into their Racks.
 
Last edited:
I have the answer for this apparently the Reason NVIDIA Invested into Intel was for this exact reason they wanted a X86 CPU that can do NVLink and there are two choices and since AMD is a competitor so they went with Intel. Intel will take their CPU Dies and just slap on a NVLink Die and sell it to Nvidia which Nvidia will put into their Racks.
Yeah, it's the "just slap on" part that makes me wonder. :) Adding a low-latency and high bandwidth network interface to a CPU and allowing it to realize its performance potential is tricky stuff.
 
Yeah, it's the "just slap on" part that makes me wonder. :) Adding a low-latency and high bandwidth network interface to a CPU and allowing it to realize its performance potential is tricky stuff.
It's just a different IO Die For NVLink vs Standard Diamond Rapids thats about it but i agree with you it's quite tricky
 
Back
Top