TSMC N2 specs improve, while Intel 18A gets worse

Daniel Nenni · Dec 21, 2024

lefty said:
TSMC revealed additional details about the N2 node at IEDM: 24% to 35% power reduction, 14% - 15% performance improvement at the same voltage, and 1.15X higher transistor density than N3E.
Previously in 2023, the power reduction was only 25% to 30% and speed was 10% - 15%. While in 2022, the density improvement was only 10%.
Meanwhile, the specs for Intel 18A have got worse, previously 18A was 26% improved from Intel 3, now it's only 15%

This is standard TSMC setting expectations then beating them. Something to learn from. The semiconductor industry is not like a circus. You had better deliver on what you advertise.

siliconbruh999 · Dec 22, 2024

OneEng said:
Ahh. I see.

There are 5 large "tiles", 2 smaller, and 3 larger. Underneath the 3 larger "tiles" are 4 55mm2 darkmont "sub tiles" each having 24 cores.

So, 3 super tiles, 4 sub tiles each (12 subtiles total). Each subtile has 24 cores (12x24=288). Got it.

So Intel only has to yield silly big Intel 3 super-tiles and 12 x 55mm2 Darkmont sub-tiles for Clearwater forest.

Thanks for the clarification. The image had me a little confused.

No problem happy to help

OneEng · Dec 22, 2024

siliconbruh999 said:
No problem happy to help

Hope Intel gets a handle on their ring bus latency problems though. Unless code fits into the local L2, it is a big hit to go through that ring to the L3. With so many cores sharing an L3, seems like this is a big area of concern.

Not sure why AMD is able to use Chiplets and keep their latency down while Intel and its "Tiles" seem to be struggling. When I look at the 2 approaches, Intel's approach strikes me as a better engineering solution, but I guess my gut feeling is off on this one .... at least for now

.

siliconbruh999 · Dec 22, 2024

OneEng said:
Hope Intel gets a handle on their ring bus latency problems though. Unless code fits into the local L2, it is a big hit to go through that ring to the L3. With so many cores sharing an L3, seems like this is a big area of concern.

This is a client issue not server cause server has Mesh architecture not RingBus

Xebec · Dec 23, 2024

OneEng said:
Hope Intel gets a handle on their ring bus latency problems though. Unless code fits into the local L2, it is a big hit to go through that ring to the L3. With so many cores sharing an L3, seems like this is a big area of concern.

Not sure why AMD is able to use Chiplets and keep their latency down while Intel and its "Tiles" seem to be struggling. When I look at the 2 approaches, Intel's approach strikes me as a better engineering solution, but I guess my gut feeling is off on this one .... at least for now .

Just to add a little context to Intel's latency problem:

Source: https://chipsandcheese.com/p/examining-intels-arrow-lake-at-the

blueone · Dec 23, 2024

Xebec said:
Just to add a little context to Intel's latency problem:

View attachment 2591

Source: https://chipsandcheese.com/p/examining-intels-arrow-lake-at-the

I'm not surprised. Going off-die (from the perspective of the CPU core chiplet) will definitely add latency and queuing factors in memory access. This is dumb. And DDR5 adds its own CAS latency.

OneEng · Dec 23, 2024

blueone said:
I'm not surprised. Going off-die (from the perspective of the CPU core chiplet) will definitely add latency and queuing factors in memory access. This is dumb. And DDR5 adds its own CAS latency.

AMD does it as well IIRC.

blueone · Dec 23, 2024

OneEng said:
AMD does it as well IIRC.

You're correct, but AMD uses an in-package version of Infinity Fabric, which apparently has lower latency than Intel's Ring Bus. Intel's CPU architecture was clearly conceived for a single die implementation, and it doesn't seem like they re-architected the CPUs for the transition to a chiplet design. I'm just guessing, but I think this added latency Xebec is noting is due to what happens when you don't fundamentally redesign for chiplets.

OneEng · Dec 23, 2024

blueone said:
You're correct, but AMD uses an in-package version of Infinity Fabric, which apparently has lower latency than Intel's Ring Bus. Intel's CPU architecture was clearly conceived for a single die implementation, and it doesn't seem like they re-architected the CPUs for the transition to a chiplet design. I'm just guessing, but I think this added latency Xebec is noting is due to what happens when you don't fundamentally redesign for chiplets.

Agree.

Intel was late to the game with their tile architecture, but you would think they would have anticipated the latency issue and designed accordingly.

I am guessing they will address this in spades with Panther Lake and future designs since it is causing so many issues in Arrow Lake.

Still, considering the latency, the performance is still quite impressive.

blueone · Dec 24, 2024

OneEng said:
Agree.

Intel was late to the game with their tile architecture, but you would think they would have anticipated the latency issue and designed accordingly.

I am guessing they will address this in spades with Panther Lake and future designs since it is causing so many issues in Arrow Lake.

Still, considering the latency, the performance is still quite impressive.

The Intel CPU designers I have known were/are very smart people. I suspect they knew about the latency issue, but I also suspect schedule constraints and the overwhelming economic temptations of chiplets (sorry Intel, tiled dies) probably led them to just hold their nose and proceed down the only viable development path. For the vast majority of customers running the most popular applications (Chrome, Edge, Office, email, etc.) they might not ever notice a performance degradation.

OneEng · Dec 24, 2024

blueone said:
The Intel CPU designers I have known were/are very smart people. I suspect they knew about the latency issue, but I also suspect schedule constraints and the overwhelming economic temptations of chiplets (sorry Intel, tiled dies) probably led them to just hold their nose and proceed down the only viable development path. For the vast majority of customers running the most popular applications (Chrome, Edge, Office, email, etc.) they might not ever notice a performance degradation.

Looking at the industry indicators (exponentially rising cost of new nodes, diminishing returns on transistor density, higher cost processes per wafer, etc), one could argue that Intel missed the boat something awful in the "tile" department. "Tiles" were an obvious solution to the dilemma. I just think the architecture wouldn't support it, and Intel didn't want to invest the time (and the 1 to 2 launches) it would take to transition to tiles until no other option was left to them. IIRC, AMD suffered a bit of a setback when they first transitioned to "chiplets" with respect to latency, but they have had many years to refine the product around the process (since around 2015?).

I agree that Intel CPU designers are very (extremely) smart people. I used to have a project manager that frequently said "perfection is the death of on-time delivery". Something had to give, and latency was it apparently on Arrow Lake.

Had Arrow Lake been able to be produced on 18A as Intel's original roadmap had it, things may have looked different. I imagine that doing a design for 18A then shifting to N3B (with a completely different set of design tools) likely took a big bite out of Intel's timing.

I wouldn't have wanted to be the one that broke that news to upper management

.

blueone · Dec 24, 2024

OneEng said:
Looking at the industry indicators (exponentially rising cost of new nodes, diminishing returns on transistor density, higher cost processes per wafer, etc), one could argue that Intel missed the boat something awful in the "tile" department. "Tiles" were an obvious solution to the dilemma. I just think the architecture wouldn't support it, and Intel didn't want to invest the time (and the 1 to 2 launches) it would take to transition to tiles until no other option was left to them. IIRC, AMD suffered a bit of a setback when they first transitioned to "chiplets" with respect to latency, but they have had many years to refine the product around the process (since around 2015?).

I agree. I remember a technical presentation made by someone (I think) in Intel Labs about the benefits of the tiled die approach in something like 2009-2010, but my memory is hazy about the dates. The point is that Intel fiddled with tiled die technology a long time before they found it inevitable. It was obvious there were lots of advantages, but the inter-die interconnect problem was obvious too. Of course, it is important to keep in mind that nothing beats being on-die for overall performance and low latency. If manufacturing says they can name that tune in one die, you can bet the CPU designers are going to scream "Sold!" Few things in life are as obvious as the design benefits of one big die.

OneEng said:
I agree that Intel CPU designers are very (extremely) smart people. I used to have a project manager that frequently said "perfection is the death of on-time delivery". Something had to give, and latency was it apparently on Arrow Lake.

Agreed.

OneEng said:
Had Arrow Lake been able to be produced on 18A as Intel's original roadmap had it, things may have looked different. I imagine that doing a design for 18A then shifting to N3B (with a completely different set of design tools) likely took a big bite out of Intel's timing.

I wouldn't have wanted to be the one that broke that news to upper management .

Twice in my career I had to break schedule news like that to senior management, and both times it didn't go well.

Xebec · Dec 24, 2024

blueone said:
I agree. I remember a technical presentation made by someone (I think) in Intel Labs about the benefits of the tiled die approach in something like 2009-2010, but my memory is hazy about the dates. The point is that Intel fiddled with tiled die technology a long time before they found it inevitable. It was obvious there were lots of advantages, but the inter-die interconnect problem was obvious too. Of course, it is important to keep in mind that nothing beats being on-die for overall performance and low latency. If manufacturing says they can name that tune in one die, you can bet the CPU designers are going to scream "Sold!" Few things in life are as obvious as the design benefits of one big die.

FWIW - Intel Westmere uarch released Jan 2010 seems to have had this approach:

The Clarkdale processor package contains two dies: the 32 nm processor die with the I/O connections, and the 45 nm graphics and integrated memory controller die

"Physical separation of the processor die and memory controller die resulted in increased memory latency."

Intel definitely had a lot of time to learn from this and also from AMD's approach since Zen 2 in 2019..

Is it more likely that Swan's pivot to TSMC N3 in 2020/2021 caused some design changes mid-stream with ARL, or is there something about the Foveros approach adding latency more than what TSMC/AMD offers? Something else?

Xebec · Dec 24, 2024

One more data point - Arrow Lake's L3 cache has >50% higher latency (with less bandwidth) than AMD's too, despite being similar size:

(Also from ChipsandCheese.com)

siliconbruh999 · Dec 24, 2024

OneEng said:
Looking at the industry indicators (exponentially rising cost of new nodes, diminishing returns on transistor density, higher cost processes per wafer, etc), one could argue that Intel missed the boat something awful in the "tile" department. "Tiles" were an obvious solution to the dilemma. I just think the architecture wouldn't support it, and Intel didn't want to invest the time (and the 1 to 2 launches) it would take to transition to tiles until no other option was left to them. IIRC, AMD suffered a bit of a setback when they first transitioned to "chiplets" with respect to latency, but they have had many years to refine the product around the process (since around 2015?).

I agree that Intel CPU designers are very (extremely) smart people. I used to have a project manager that frequently said "perfection is the death of on-time delivery". Something had to give, and latency was it apparently on Arrow Lake.

Had Arrow Lake been able to be produced on 18A as Intel's original roadmap had it, things may have looked different. I imagine that doing a design for 18A then shifting to N3B (with a completely different set of design tools) likely took a big bite out of Intel's timing.

This wouldn't have been possible cause ARL was scheduled to use N3(N3B) during Swan Era at that time there was no confidence on Intel Fabrication if Intel wouldn't have invested in their fabs there fabs wouldn't be in such a better condition after 10nm debacle swan made it worse

OneEng said:
I wouldn't have wanted to be the one that broke that news to upper management .

Upper management is such a joke at Intel though

OneEng · Dec 25, 2024

blueone said:
Of course, it is important to keep in mind that nothing beats being on-die for overall performance and low latency. If manufacturing says they can name that tune in one die, you can bet the CPU designers are going to scream "Sold!" Few things in life are as obvious as the design benefits of one big die.

LOL. "Sold" indeed!

I still consider it a leadership miss. If you look at the yield advantages of tiles, the financial cost of maintaining a monolithic design really starts looking terrible. Someone should have said "Make it So" and shifted priorities to getting the next design and all future designs ready for tiles.

I have a strong feeling that this activity is on full-throttle at Intel as we speak. Over the decades I have followed this industry, I have seen Intel step in a pile on more than one occasion (and sometimes more than one pile at a time!). The design team has pulled it out in style every time (eventually). I think people are going to be surprised when Panther Lake is released. I strongly expect the latency issue to be fixed in spades, and the added density of 18A with BSPDN will likely give the design team a better transistor budget to boot.

Oh, and MERRY CHRISTMAS to everyone!

blueone · Dec 25, 2024

OneEng said:
LOL. "Sold" indeed!

I still consider it a leadership miss. If you look at the yield advantages of tiles, the financial cost of maintaining a monolithic design really starts looking terrible. Someone should have said "Make it So" and shifted priorities to getting the next design and all future designs ready for tiles.

I completely agree. The tiled die strategy was so obvious the first time I saw it, I thought it should be adopted ASAP. Of course, the CPU design team was obviously going to hate it. The transition to tiles from a single design injects a lot of risk and schedule hits, so I'm pretty sure if I was a CPU design manager I wouldn't like using functionally distributed tiles the first time either! Going from single instance to multi-instance anything is complicated.

OneEng said:
I have a strong feeling that this activity is on full-throttle at Intel as we speak. Over the decades I have followed this industry, I have seen Intel step in a pile on more than one occasion (and sometimes more than one pile at a time!). The design team has pulled it out in style every time (eventually). I think people are going to be surprised when Panther Lake is released. I strongly expect the latency issue to be fixed in spades, and the added density of 18A with BSPDN will likely give the design team a better transistor budget to boot.

I hope you're correct!

MKWVentures · Dec 26, 2024

OneEng said:
Had Arrow Lake been able to be produced on 18A as Intel's original roadmap had it, things may have looked different. I imagine that doing a design for 18A then shifting to N3B (with a completely different set of design tools) likely took a big bite out of Intel's timing.

Thats not how the roadmap for arrow lake progression went.

siliconbruh999 · Dec 26, 2024

MKWVentures said:
Thats not how the roadmap for arrow lake progression went.

Meteor Lake/Arrow lake were supposed to maximize IP sharing with Tick/Tock strategy kind of
Port the architecture to Tiles than improve upon the architecture

OneEng · Dec 27, 2024

MKWVentures said:
Thats not how the roadmap for arrow lake progression went.

My bad. The original plan was to do 20A and TSMC N3 (seems like a PITA plan to me). When 20A was abandoned, I guess the backup plan (as I can't fathom Intel outsourcing fabrication of a flag-ship processor being a strategic move) became the only plan.

Of course, it is possible that all of this was a pipe dream (20A Arrow Lake) and that Intel understood from the beginning they would not be able to make timing in their own fab.

TSMC N2 specs improve, while Intel 18A gets worse

Admin

Well-known member

Active member

Well-known member

Well-known member

Well-known member

Active member

Well-known member

Active member

Well-known member

Active member

Well-known member

Well-known member

Well-known member

Well-known member

Active member

Well-known member

Moderator

Well-known member

Active member