TSMC N2 specs improve, while Intel 18A gets worse

nghanayem · Dec 27, 2024

Xebec said:
I read the post a few times actually, and I agree and understand it's physically impossible for an Intel 7 backport of ARL to have better perf/watt, but I'm not sure it would lose in all benchmarks/scenarios to a N3B version. I interpreted your post as the Fmax would be more limited on power constrained scenarios, but not necessarily in "not power constrained" scenarios such as a small # of cores running at once (typical desktop application usage).

There is no such thing as a scenario that isn't power constrained. For example; that 14900KS does hit a power and thermal density limit. Use some Liquid N2 and you can go faster. But even then, at the end of the day you hit a thermal wall. Power and performance are two sides of the same coin. They may not always transfer 1:1 and different Xtor characteristics buy you more of one or another. A good example is leakage. For a low power embedded device, getting lower leakage saves a ton on a very limited power budget. On the other side leakage reductions for a huge die, huge power consumption, large Xtor count GPU can allow for much higher frequencies on account of the leakage power savings being multiplied by the number of Xtors. Meanwhile, your i9 doesn't have that many transistors in the grand scheme of things and the power/thermal budget is anything but small, so leakage reductions don't really buy you very much extra performance.

Xebec said:
Intel 7 Ultra seems suited to higher clocks than N3B as it allows both higher max temperatures (115C allowed in BIOS vs 105C) and stable voltages (1.55V on 14900KS for 2 cores vs. 1.45V on 285K).

Intel says they moved where the thermal probes are on the chips, so the numbers are not like for like and cannot be compared. As for the voltage, that buys you diminishing frequency returns at higher V. The decoupling capacitor on 10nm SF and intel 7 do run laps around N3's. TSMC N3 also doesn't have those renowned Cu giant metals which is also a disadvantage. If I was a betting man, I would say this and the lower process maturity/stability are likely why the intel 7 chips can run at a higher V stable. Better d-cap allows for lower voltage guard band and the Cu giant metals offer a better PDN. On the actual device side of things though, N3 transistors simply run laps around intel 7 transistors. They've got higher NMOS mobility, better V-f curves, lower leakage, lower switching power, WAY higher PMOS mobility, etc. That much is scientific fact. I would assume all the Xtor level goodness out does intel's leadership in upper BEOL technology.

I don't know enough to say but I assume since they are different designs that Vmax in BIOS are not like for like statements on the high voltage capabilities of the two processes.

Xebec said:
I think the trade off would be that while an Intel 7 version would have higher overall power consumption, it would actually clock faster for lightly threaded applications. (Idle power would also be higher, but that's less of an issue on the desktop, with both AMD and Intel desktops idling at higher power today than from the Sandy Bridge era..)

I suspect that you could disable all but a single core at 1.55V for intel 7 and 1.45V for N3, and the N3 one would still be faster AND use less power.

Xebec said:
P.S. ARL P-Cores may not have much more in the way of logic transistor count than RPL; consider that they gain only 9% IPC while actually dropping SMT all together. ARL E-cores though certainly are much heavier than RPL E-cores..

They are ALOT bigger. Core w/o L2 equivalent vs Core w/o L2; RPC is ~1.6X the size of LNC (5.4/3.4). The 3 fin cell on N3B is similar to the 210x51 of N5 (2.29X the density of the 408x60 i7 library that makes up the entire core area of a RPC). A hypothetical N3 RPL should then be around 2.4mm2. This means LNC w/o L2 is ~1.4X the area iso process to RPC w/o L2 (assuming intel continues to only use tall cells at the widest poly pitch in the core area). L3 cache is the same, but L2 is up by 25% as well as the insertion of the L0 cache, and there is also the small but not non-existent die-die PHY. TLDR Arrow lake compute die is a whole lot chunkier than the compute element of RPL. Claimed IPC improvement may not be large, but Intel did drastically widen the core and increase cache amount and bandwidth (far from free).

siliconbruh999 · Dec 27, 2024

OneEng said:
My bad. The original plan was to do 20A and TSMC N3 (seems like a PITA plan to me). When 20A was abandoned, I guess the backup plan (as I can't fathom Intel outsourcing fabrication of a flag-ship processor being a strategic move) became the only plan.

Of course, it is possible that all of this was a pipe dream (20A Arrow Lake) and that Intel understood from the beginning they would not be able to make timing in their own fab.

Or it wouldn't be viable economically to ramp 20A ramping a fab is not cheap for one off dies so they bit the bullet and decided to outsource to TSMC for Intel 4 they improved upon their design and it is still a volume part and they had many other dies

OneEng · Dec 28, 2024

nghanayem said:
There is no such thing as a scenario that isn't power constrained. For example; that 14900KS does hit a power and thermal density limit. Use some Liquid N2 and you can go lower, but even then at the end of the day you hit a thermal wall. Power and performance are two sides of the same coin. They may not always transfer 1:1 and different Xtor characteristics buy you more of one or another. A good example is leakage. For a low power embedded device, getting lower leakage saves a ton on a very limited power budget. On the other side leakage reductions for a huge die, huge power consumption, large Xtor count GPU can allow for much higher frequencies on account of the leakage power savings being multiplied by the number of Xtors. Meanwhile, your i9 doesn't have that many transistors in the grand scheme of things and the power/thermal budget is anything but small, so leakage reductions don't really buy you very much extra performance.

Intel says they moved where the thermal probes are on the chips, so the numbers are not like for like and cannot be compared. As for the voltage, that buys you diminishing frequency returns at higher V. The decoupling capacitor on 10nm SF and intel 7 do run laps around N3's. TSMC N3 also doesn't have those renowned Cu giant metals which is also a disadvantage. If I was a betting man, I would say this and the lower process maturity/stability are likely why the intel 7 chips can run at a higher V stable. Better d-cap allows for lower voltage guard band and the Cu giant metals offer a better PDN. On the actual device side of things though, N3 transistors simply run laps around intel 7 transistors. They've got higher NMOS mobility, better V-f curves, lower leakage, lower switching power, WAY higher PMOS mobility, etc. That much is scientific fact. I would assume all the Xtor level goodness out does intel's leadership in upper BEOL technology.

I don't know enough to say but I assume since they are different designs that Vmax in BIOS are not like for like statements on the high voltage capabilities of the two processes.

I suspect that you could disable all but a single core at 1.55V for intel 7 and 1.45V for N3, and the N3 one would still be faster AND use less power.

They are ALOT bigger. Core w/o L2 equivalent vs Core w/o L2; RPC is ~1.6X the size of LNC (5.4/3.4). The 3 fin cell on N3B is similar to the 210x51 of N5 (2.29X the 408x60 i7 library that makes up the entire core area of a RPC). A hypothetical N3 RPL should then be around 2.4mm2. This means LNC w/o L2 is ~1.4X the area iso process to RPC w/o L2 (assuming intel continues to only use tall cells at the widest poly pitch in the core area). L3 cache is the same, but L2 is up by 25% as well as the insertion of the L0 cache, and there is also the small but not non-existent die-die PHY. TLDR Arrow lake compute die is a whole lot chunkier than the compute element of RPL. Claimed IPC improvement may not be large, but Intel did drastically widen the core and increase cache amount and bandwidth (far from free).

You beat me to it.

Seems like Lion Cove got a lot of transistors over RPL, but got very little in return from performance.

I attribute this to a combination of the poor latency in the design due to tile implementation, and Intels design team not being able to dictate process parameters optimal for the design.

Of course, I could be totally off base here.

Brady · Dec 28, 2024

It’s going to be very interesting to see Lion Cove “come home” from N3 to 18A in Cougar Cove.

The system improvements in Panther Lake vs Arrow Lake like tile layout, ring, etc will likely obscure things a bit, but for the first time we’ll be able to get the best look at like-for-like same company, same core lineage comparisons built on both TSMC and Intel Foundry .

siliconbruh999 · Dec 28, 2024

Brady said:
It’s going to be very interesting to see Lion Cove “come home” from N3 to 18A in Cougar Cove.

The system improvements in Panther Lake vs Arrow Lake like tile layout, ring, etc will likely obscure things a bit, but for the first time we’ll be able to get the best look at like-for-like same company, same core lineage comparisons built on both TSMC and Intel Foundry .

Architecturally speaking Cougar Cove/Darkmont are just refreshes with 5-8% IPC improvements but the actual layout will be days and night apart with Finfet vs GAA+PowerVia
Panther lake is more akin to LNL than ARL

OneEng · Dec 28, 2024

Brady said:
It’s going to be very interesting to see Lion Cove “come home” from N3 to 18A in Cougar Cove.

The system improvements in Panther Lake vs Arrow Lake like tile layout, ring, etc will likely obscure things a bit, but for the first time we’ll be able to get the best look at like-for-like same company, same core lineage comparisons built on both TSMC and Intel Foundry .

siliconbruh999 said:
Architecturally speaking Cougar Cove/Darkmont are just refreshes with 5-8% IPC improvements but the actual layout will be days and night apart with Finfet vs GAA+PowerVia
Panther lake is more akin to LNL than ARL

Seems like 5-8% IPC could be easily gained just by shaving off some of the latency without any core changes. I too have seen these IPC estimates though, so evidentially, it isn't going to be a radically improved processor.

What I am interested in seeing is how dense they can make it on 18A and the clock scaling of this process for the Lion Cove architecture. Depending on the clock speed, this could make or break the performance of the processor. Depending on the density and yield, it could make or break Intel's financials.

siliconbruh999 · Dec 28, 2024

Financial Challenges are more worrisome to Intel than Technical due to having the bean counters running the place ( if it not makes Margin kill the product)

Xebec · Dec 29, 2024

nghanayem said:
There is no such thing as a scenario that isn't power constrained. For example; that 14900KS does hit a power and thermal density limit. Use some Liquid N2 and you can go faster. But even then, at the end of the day you hit a thermal wall. Power and performance are two sides of the same coin. They may not always transfer 1:1 and different Xtor characteristics buy you more of one or another. A good example is leakage. For a low power embedded device, getting lower leakage saves a ton on a very limited power budget. On the other side leakage reductions for a huge die, huge power consumption, large Xtor count GPU can allow for much higher frequencies on account of the leakage power savings being multiplied by the number of Xtors. Meanwhile, your i9 doesn't have that many transistors in the grand scheme of things and the power/thermal budget is anything but small, so leakage reductions don't really buy you very much extra performance.

Thanks for the reply --

Are you saying when 14900KS is running a single core workload at 6.2 GHz while the rest of the cores are asleep, the single core is still power and thermally constrained in a desktop chassis? Is this at the transistor level?

As measured, a single Raptor Lake 14th gen core at 6.2 GHz consumes less than 44W under load (the 44W includes the other cores idling and the full cache in active mode): https://www.techpowerup.com/review/intel-core-i9-14900ks/22.html

(Most PC workloads are "lightly threaded" and definitely don't peg all 24 cores on the i9-14900KS, and usually not more than 6-8 of them. Amdahl's Law)

nghanayem said:
They are ALOT bigger. Core w/o L2 equivalent vs Core w/o L2; RPC is ~1.6X the size of LNC (5.4/3.4). The 3 fin cell on N3B is similar to the 210x51 of N5 (2.29X the density of the 408x60 i7 library that makes up the entire core area of a RPC). A hypothetical N3 RPL should then be around 2.4mm2. This means LNC w/o L2 is ~1.4X the area iso process to RPC w/o L2 (assuming intel continues to only use tall cells at the widest poly pitch in the core area). L3 cache is the same, but L2 is up by 25% as well as the insertion of the L0 cache, and there is also the small but not non-existent die-die PHY. TLDR Arrow lake compute die is a whole lot chunkier than the compute element of RPL. Claimed IPC improvement may not be large, but Intel did drastically widen the core and increase cache amount and bandwidth (far from free).

Ugh - this is pretty disappointing on the Arrow Lake "performance design" side then. 1.4X for the amount of performance increase is pretty poor; and this size increase makes sense given the increased width of the core. I guess the savings from removing SMT isn't substantial enough then. Is some of this 1.4X the communications components required for chiplet arch. or does that add even more?

nghanayem said:
I suspect that you could disable all but a single core at 1.55V for intel 7 and 1.45V for N3, and the N3 one would still be faster AND use less power.

Understand. I appreciate the detailed technical response on why the Vmax's are different.

MKWVentures · Dec 30, 2024

OneEng said:
My bad. The original plan was to do 20A and TSMC N3 (seems like a PITA plan to me). When 20A was abandoned, I guess the backup plan (as I can't fathom Intel outsourcing fabrication of a flag-ship processor being a strategic move) became the only plan.

Of course, it is possible that all of this was a pipe dream (20A Arrow Lake) and that Intel understood from the beginning they would not be able to make timing in their own fab.

As I have mentioned before. Intel was doing some parallel designs (Maybe still are). you cannot go back and forth between nodes quickly. It was designed for N3 and 20A in two parallel projects. They chose the N3 node due to financial reasons trying to ramp process. 20A was cancelled. The only frustrating point was that people/investors were not accepting that the 20A project was killed off and kept claiming ARL was on 20A despite my comments and hints from Pat.

If ARL ramped on 20A, they would have run 10K wafer per month on a chip that wasnt a mission critical and debugged GAA and BSPD. However it would be 5-8B in Capex spending and Opex that isnt REALLY needed until a year later. Given Intels Financial situation, that wasnt going to work.

So now Intel has to manage product ramps on 18A in OR and AZ based on scenarios.

What we need to look for in Late 2025 is one of two quotes:

"we are ramping 18A and the product costs are improving our margins as expected"

OR

"We saw larger than expected 18A start up costs and product costs. we will see lower margins as we ramp this leading edge process. 14A will get us to target margins"

Again, Intels problems are not technical (assuming they are being honest on product health). They are financial. ramping 18A to 25K wafers per month wont do it.

nghanayem · Dec 30, 2024

Xebec said:
Thanks for the reply --

Are you saying when 14900KS is running a single core workload at 6.2 GHz while the rest of the cores are asleep

Yes

Xebec said:
the single core is still power and thermally constrained in a desktop chassis? Is this at the transistor level?

As measured, a single Raptor Lake 14th gen core at 6.2 GHz consumes less than 44W under load (the 44W includes the other cores idling and the full cache in active mode): https://www.techpowerup.com/review/intel-core-i9-14900ks/22.html

Yes. Even if it is "only" 44W for one core there are thermal concerns. Even though 44W isn't a ton of heat in the area of one core it is an absurd amount. I am sure you aware of the concern of thermal runaway and how this is why thermal throttling exists. Just because the rest of the chip is cool, if you have one part that is super hot you still need to down clock to prevent runaway at the hot part. There are simply physical limits for how fast heat can be removed from the bulk Si substrate (both diffusing laterally to the cooler parts of the die and up towards the cooling system). Even with fancy tricks like direct die cryogenic cooling the thermal conductivity of Si is a constant. Even though people often say power unconstrained, eventually you hit the BIOS maximum V, thermal throttling, or you just fry the chip. Hence, why I said there is no such thing as unlimited power budget.

Xebec said:
Ugh - this is pretty disappointing on the Arrow Lake "performance design" side then. 1.4X for the amount of performance increase is pretty poor; and this size increase makes sense given the increased width of the core. I guess the savings from removing SMT isn't substantial enough then. Is some of this 1.4X the communications components required for chiplet arch. or does that add even more?

Visually, it is reasonably small (see the images below. As an aside, I had to crop them so I could actually paste them in). Also, the D2D PHY would have no impact on core size (at least in a 2.5D system such as recent Intel client SOCs and Nvidia's datacenter GPUs or a 2D MCM like most of AMD's Zen desktop and datacenter CPUs). Conceptually the D2D PHYs are similar to a USB or DDR PHY on the die edge. Now with 3D chips things are more wonky since signals can be coming up from anywhere. If memory serves the Zen 3/4 V-cache stuff, it is a small region in the cache area that had TSVs that take up area (both due to the fact the TSVs have a non-zero diameter but also the exclusion zone around the TSVs).

Arrow lake SOC image credit: intel via toms hardware with annotation provided by myself

LNL SOC annotated image: Excerpt from techpowerup and claimed to be created by GeenWens and Kurnalsalts

Xebec said:
Understand. I appreciate the detailed technical response on why the Vmax's are different.

It is just a theory on my part. I don't know enough about the RPL and ARL core and SOC design to definitively say what drove the different Vmaxs.

Xebec · Dec 31, 2024

nghanayem said:
Yes

Yes. Even if it is "only" 44W for one core there are thermal concerns. Even though 44W isn't a ton of heat in the area of one core it is an absurd amount.

Thanks - I guess if we assume TSMC N3 is more efficient at 6 GHz+ than Intel 7 then the extra 'size' of the core on Intel 7 will be more than offset by the heat increase. This makes sense to me.

There are/were two reasons I'm hesitant here:

1. A lot of sources indicate that it's harder to stay at high clocks on newer/smaller nodes because of wire resistances, leakage challenges, and thermal density issues that "getting too small" causes. Though when I look at the 'sample core' Cortex A78 cores for TSMC N7, N5, N3 - the scaling appears to improve at all speeds/voltages. (See figure below).

2. When Intel went from 32nm to 22nm (first Finfet node), there was data showing voltage and power scaling appeared worse above 4.8 GHz on 22nm than 32nm*, on the same basic uarch (Sandy Bridge/Ivy Bridge).

Based on these two factors and the maturity of Intel 7 Ultra, I wasn't sure that more efficient transistors at high clock speed on TSMC N3 could be assumed.

OK - your argument makes sense overall. It looks like for Arrow Lake to increase clock speed from where it's at, it would probably need to be ported 'forward' to something like 18A/20A. Thanks nghanayem!

*With the caveat that the sample size wasn't large, 32nm was mature (like Intel 7), and there could be other design factors causing this.

OneEng · Jan 2, 2025

Does anyone know if we will be getting measured metrics from Intel 18A since it is supposedly going to be used like TSMC to manufacture 3rd party designs with?

I know in the past, lots of articles just guessed since Intel did not typically disclose this information while TSMC does it like a promotion on a billboard

.

If so, when do you expect we will get our first look at how the process shaped up? Same thing for N2.

Also, since both 18A and N2 will be using low NA machines (I think the same lithography machines correct?), why is it that 18A is only achieving SRAM density in line with N3E while N2 achieves higher density? Does it have anything to do with BSPDN?

Also, it is my understanding that the primary advantage of High NA (which I believe Intel plans to pilot along side of 18A, but not for production), is that it allows fewer passes since it can cut the same channel without multi-patterning. If that is the case, it seems like Intel would have a chance to leap-frog TSMC sometime in the future by getting to High NA first ...... just as TSMC did with EUV over Intel who held onto DUV way past the point where it was clear that new die shrinks would require EUV.

I know this is what Intel said in Q2 2024, but I am guessing this is already slipped by a year:

TSMC on the other hand is saying no High NA until 2030!

Perhaps history will repeat itself ..... if Intel can stay out of financial trouble until around 2027.

siliconbruh999 · Jan 3, 2025

OneEng said:
Does anyone know if we will be getting measured metrics from Intel 18A since it is supposedly going to be used like TSMC to manufacture 3rd party designs with?

Why do you need 3rd party design teardown when Intel 18A First Party products will be launching In Q3/Q4 25

OneEng said:
I know in the past, lots of articles just guessed since Intel did not typically disclose this information while TSMC does it like a promotion on a billboard .

If so, when do you expect we will get our first look at how the process shaped up? Same thing for N2.

Also, since both 18A and N2 will be using low NA machines (I think the same lithography machines correct?), why is it that 18A is only achieving SRAM density in line with N3E while N2 achieves higher density? Does it have anything to do with BSPDN?

Also, it is my understanding that the primary advantage of High NA (which I believe Intel plans to pilot along side of 18A, but not for production), is that it allows fewer passes since it can cut the same channel without multi-patterning. If that is the case, it seems like Intel would have a chance to leap-frog TSMC sometime in the future by getting to High NA first ...... just as TSMC did with EUV over Intel who held onto DUV way past the point where it was clear that new die shrinks would require EUV.

I know this is what Intel said in Q2 2024, but I am guessing this is already slipped by a year:

TSMC on the other hand is saying no High NA until 2030!

Perhaps history will repeat itself ..... if Intel can stay out of financial trouble until around 2027.

This is something only time can tell

Fred Chen · Jan 3, 2025

OneEng said:
Also, it is my understanding that the primary advantage of High NA (which I believe Intel plans to pilot along side of 18A, but not for production), is that it allows fewer passes since it can cut the same channel without multi-patterning. If that is the case, it seems like Intel would have a chance to leap-frog TSMC sometime in the future by getting to High NA first ...... just as TSMC did with EUV over Intel who held onto DUV way past the point where it was clear that new die shrinks would require EUV.

The High-NA has T2T and pupil fill limitations at ~20 nm pitch where it was supposed to take over from low-NA, going by SPIE papers 120520G (ASML) and 1321505 (imec). Depending on the requirements, more passes may still be needed.

Fred Chen · Jan 3, 2025

OneEng said:
Also, since both 18A and N2 will be using low NA machines (I think the same lithography machines correct?), why is it that 18A is only achieving SRAM density in line with N3E while N2 achieves higher density? Does it have anything to do with BSPDN?

We'll have to look at the 18A vs N2 pitches (gate, metal) when they are shown.

nghanayem · Jan 3, 2025

Xebec said:
1. A lot of sources indicate that it's harder to stay at high clocks on newer/smaller nodes because of wire resistances, leakage challenges, and thermal density issues that "getting too small" causes.

This is a common misunderstanding. Putting aside the transistor variation/process maturity part of the equation limiting frequency, SRAM Vmin, etc. There are two major issues. One as you mentioned was the growing problem of rising interconnect RC slowing down chips. The other is that in a post Dennard world all parts of the transistor are not scaled linearly (and even if you did factors like leakage would prevent you from getting the projected linear power reduction and frequency bump). In the modern day, much of the scaling is done by shrinking the space between transistors rather than the size of transistors. Assuming you change nothing other than say reduce the space between devices, total performance and power characteristics would degrade. The reason for this is parasitic capacitance. An easy example to visualize is the gates of your NMOS and PMOS. Both are giant pillars of metal. Thus, they act as a parallel plates and create parasitic capacitance that slows operation and increases power consumption. Now apply this to your individual fins/nanosheets, contacts, etc. and your various "DTCO" tricks end up degrading how the devices perform (of course assuming all else is equal). Parasitic cap linearly degrades your power and performance (power = (1/2)*f*C*V^2). Of course having every new process node be a regression would be unacceptable, so you see tons of innovation in materials and chemistry to not only claw the performance back, but even exceed the performance of the old node. People do things like increase channel strain, SiGe PMOS, shorten contacts, depopulate excess metal, add low-K spacers, increase fin drive current so you can do fin depopulation, etc.

Now with the context, yes, the leakage does indeed get worse as you scale. I don't remember the numbers, but I would not be shocked in the least if N3 has lower leakage than intel 7 even in spite of the N3 devices being smaller. Leakage reductions are not as large of a benefit to intel's products as they are to TSMC's most important customers (causing divergent priorities). There is also intel's Gox which comes with the penalty of channel control (see below). Finally, it isn't exactly a secret that an N3 transistor isn't really that much smaller than an intel 7 one. For example, the poly pitch has a 0.89 scale factor (which is worse than your historical full node shrink of 0.7x). When most of your density gain is coming from cell height reductions, the leakage problem slows down. My understanding is that the leakage getting worse on new nodes is more of a problem at lower power. For something like an intel desktop CPU the power is scaling by a power of 2 with voltage, linearly with freq, and additivity with leakage. Since that intel CPU would be using ultra low threshold voltages, the leakage was always going to be abysmal, be it on 2"nm" or 65"nm" (exaggerated for dramatic effect, but you get the idea). RC of interconnects is as you say an unavoidable problem, and intel 7 has some advantages here. Just like with parasitic cap, the interconnect RC will impact all speeds equally (assuming you don't have to worry about dielectric breakdown/arcing of the ILD which I am unsure of how reasonable this assumption is at ultra-high V). The thermal density thing is as you say a big issue. One that would presumably be much more of a limiter at higher V (on account of the power dissipation scaling the fastest with V).

Xebec said:
Though when I look at the 'sample core' Cortex A78 cores for TSMC N7, N5, N3 - the scaling appears to improve at all speeds/voltages. (See figure below).

That is expected behavior. The returns are often not as large as high voltages, but I can't really think of many (if any) instances where you saw large performance improvements at the low and mid range and a regression at the high range.

Xebec said:
2. When Intel went from 32nm to 22nm (first Finfet node), there was data showing voltage and power scaling appeared worse above 4.8 GHz on 22nm than 32nm*, on the same basic uarch (Sandy Bridge/Ivy Bridge).

Major transistor architecture changes break performance. The performance then needs to be reengineered from scratch back into the process. I don't really have many good examples of things that broke performance during the SO2 -> HfO2 or planar -> finFET transitions. One I can think of is channel definition on a finFET. finFET has lower active area in the same footprint as planar (on account of the empty space between fins that are filled). The way you do the S/D and well implants also changed significantly. So if you want to your finFET to actually be better than you planar FET you needed to put in the engineering work to get a tall enough fin, with good enough profiles, properly done implants, and a well shaped S/D. Pat G said something once that I really liked the analogy with. He said something to the effect of how intel 3 is the end of the turbocharging era and how 18A is like the first EVs. It really is true though when you change device architectures it upends the paradigm. Things that worked no longer work and you need to figure out how to reimplement them. As an example, placing your gate first was a standard practice as it allowed you to self align the S/D to your gate and eliminate some nasty alignment related issues. With the adoption of HKMG this was no longer possible. In theory, you could have done the old gate last process to form your metal gate. The solution people actually use though is the well known replacement gate scheme to get the best of gate first (Self aligned S/D) and gate last (HfO2 being able to be deposited after all the high-T operations that would have damaged it are complete). Considering 2nd gen 22nm parts, and everything since then have surpassed the efficiency of sandy bridge, I would just blame that on process maturity or the 22nm performance vintage of Sandy Bridge failing to hit the final performance targets intel wanted in time for Sandy Bridge's launch.

Xebec said:
Based on these two factors and the maturity of Intel 7 Ultra, I wasn't sure that more efficient transistors at high clock speed on TSMC N3 could be assumed.

OK - your argument makes sense overall. It looks like for Arrow Lake to increase clock speed from where it's at, it would probably need to be ported 'forward' to something like 18A/20A. Thanks nghanayem!

View attachment 2621

*With the caveat that the sample size wasn't large, 32nm was mature (like Intel 7), and there could be other design factors causing this.

Another major factor that I can't believe I forgot to mention for why the Vmaxs are different. Comparing across similar nodes, Intel always has thicker Gox than the thin gate devices for an equivalent TSMC process. Historically, when you look back at IEDMs from the 2000s/2010s (or even intel 4/3 versus intel 3-E) Intel's initial version of the process will have no thick gate devices. No thick gate means you can't natively do high voltage, which really restricts what you can do on the analog side of things. Although interestingly, you apparently don't "need" high voltage devices to do fancy analog stuff. I saw an interesting presentation from ISCC last year on the topic that if I am going to be honest half flew over my head. But either way you cut it, doing high voltage without thick gate will be needed going forward. TSMC mentioned at their 2024 symposia that N2 won't have thick gate oxide support and that they made a set of tools to allow easier porting of pre N2 analog IPs to N2. I have no clue how you would even begin to go about it while still having space for the gate metal between the nanosheets. With that tangent done, Intel would get around the initial process not having a thick Gox by making the default Gox unusually thick. This gave intel chip designers high enough voltages to make the PLLs, FIVIRs, clock trees, and interfaces they wanted for client CPUs. Then like clockwork, at the next IEDM you would see intel report about the "SOC" version of the last year's process with thick gate devices (and during Intel's mobile phone adventures you would see things like thin gate devices added to the SOC process for lowest standby power). The "SOC process" would then be used for various kinds of chipsets (PCH, mobo chipsets, WIFI, Bluetooth, sound, thunderbolt, security chips, etc.) and some of the ATOM phone/tablet SOCs.

But we were talking about frequency and voltage scaling on intel nodes vs TSMC nodes, not Analog Devices (pun very much intended). Even though it isn't the primary motivation, that thicker Gox allows for higher voltages. Because all the transistors have this thicker Gox, rather than only the analog devices that need thick gate having thick gates, your basic logic will (all else being equal) be able to opperate at a higher Vmax. The thicker Gox does however degrade your control of the channel, hurting switching and leakage. It also makes it harder to scale poly pitch.

OneEng said:
Does anyone know if we will be getting measured metrics from Intel 18A since it is supposedly going to be used like TSMC to manufacture 3rd party designs with?

I know in the past, lots of articles just guessed since Intel did not typically disclose this information while TSMC does it like a promotion on a billboard .

What are you talking about? TSMC is terrible about disclosing anything interesting beyond the marketing fluff (vague V-F curve for some ARM core in a like for like comparison, a non-specific full chip density uplift, min metal pitch, poly pitch, and uHD SRAM bitcell or SRAM macro scaling depending on which looks better). Intel consistently gives detailed electricals. Always gives the dimensions for basic 4T NAND logic gates, all SRAM bitcell sizes, metal layer pitches/metallization schemes.

OneEng said:
If so, when do you expect we will get our first look at how the process shaped up?

Intel seems to like going into detail about their processes at VLSI during the early summer.

OneEng said:
Same thing for N2.

Most likely never. We will get some low detail paper that will drop the pp and the min metal pitches with other charts that we have more or less seen already. Best case, we also get a couple of papers on some high speed SerDes on N2, but that is as good as we are likely to get from TSMC beyond what they have released to the public already (at least if every major TSMC node since like 10FF is anything to go off of).

OneEng said:
Also, since both 18A and N2 will be using low NA machines (I think the same lithography machines correct?), why is it that 18A is only achieving SRAM density in line with N3E while N2 achieves higher density?Does it have anything to do with BSPDN?

TSMC has two main advantages for scaling SRAM. One is that their thinner Gox allows TSMC to more easily shrink their poly pitch. Two is that historically, TSMC processes have better leakage and lower Vmin than an equivalent intel process. This allows for smaller SRAM bitcells to maintain their data better. Also, N2 is around a year after 18A. You would sure hope N2 is better or TSMC would be behind for years even if you want to assume 14A never comes out. TSMC also seems to have better pehriphery/marcros than intel. Looking at recent history intel 7 trailing N7 sram density by like 15% desipite similar logic density, and intel 4/3 also having a 15% HD bitcell density disadvantage to N5/4 dispute the similar maxium logic densities.

There is also the elephant in the room. The old intel "7nm" was labeled as a N5 competitor and "5nm" was a N3 competitor. Per intel, intel 4/3 are the old "7nm" with various performance enhancments to narrow the gap with N3 as well as some foundry ecosystem enablement. This is reflected in reality with intel 3 having N5 HD logic density, behind on SRAM, far ahead on HP logic density, and suppior to N4 HPC power-performance. Logic would dicatate that back during the Bob Swan days 20/18A were originally intel "5nm" and were originally intented to be a N3 competitor rather than an N2 competitor.

OneEng said:
Also, it is my understanding that the primary advantage of High NA (which I believe Intel plans to pilot along side of 18A, but not for production), is that it allows fewer passes since it can cut the same channel without multi-patterning. If that is the case, it seems like Intel would have a chance to leap-frog TSMC sometime in the future by getting to High NA first ...... just as TSMC did with EUV over Intel who held onto DUV way past the point where it was clear that new die shrinks would require EUV.

You don't understand properly. 7nm doesn't NEED EUV. TSMC N7 has no EUV and is just fine. EUV is overly similistic scapegoat for i10nm problems. I have written about the topic at length before but I don't feel like repeating myself. So just dig around if you care. TLDR the probems were poor process deffinition even with the information intel had at the time and poor risk managment. Per Mark Phillips and prior SPIE papers 18A doesn't really have very many multipatterned layers. There is some tip to tip issues, but nothing resoultion related. So even if Intel brought in high-NA for 18A it wouldn't really do very much. Intel also showed off directional etch results which were pretty meh. But as the process matures maybe it makes sense to change the one or two multi pass layers MP mentioned to single pass with CD elongment.

OneEng said:
I know this is what Intel said in Q2 2024, but I am guessing this is already slipped by a year:

What piece of information in the past 6mo makes you think 14A has slipped 12mo?

OneEng said:
TSMC on the other hand is saying no High NA until 2030!

1. TSMC has never said that.
2. TSMC got their first high-NA tool earlier this year
3. When intel thinks it is ready they will insert it and when TSMC thinks it is ready they will insert it
4. The earliest TSMC can reasonably insert high-NA would be at A14 (as N2/A16 are a pair). I doubt it would ever make sense to rip out existing low-NA tools for layers that might be simplier with high-NA. Since high-NA wasn't avaliable for HVM 2-3 years ago it missed the insertion window for N2/A16.
4. Based on TSMC's statement of development time for new nodes increasing and still sticking to two process development teams, A14 will be launching products in that 2028/29 timeframe.
5. In this day and age TSMC ramps processes to a higher maxium capacity than intel at peak ramp. TSMC needs to secure more tools to adopt high-NA than intel does.

For the above reasons I am not overally concerned for TSMC. Maybe ASML is ready in time for intel to ramp the version of 14A with high-NA and intel gets "the win". If that does occur, yes that would be a nice thing for 14A wafer cost compared to N2/A16, but we aren't talking some coup that will cause N2 to utilization to collapse.

blueone · Jan 3, 2025

nghanayem said:
This is a common misunderstanding. Putting aside the transistor variation/process maturity part of the equation limiting frequency, SRAM Vmin, etc. There are two major issues. One as you mentioned was the growing problem of rising interconnect RC slowing down chips. The other is that in a post Dennard world all parts of the transistor are not scaled linearly (and even if you did factors like leakage would prevent you from getting the projected linear power reduction and frequency bump). In the modern day, much of the scaling is done by shrinking the space between transistors rather than the size of transistors. Assuming you change nothing other than say reduce the space between devices, total performance and power characteristics would degrade. The reason for this is parasitic capacitance. An easy example to visualize is the gates of your NMOS and PMOS. Both are giant pillars of metal. Thus, they act as a parallel plates and create parasitic capacitance that slows operation and increases power consumption. Now apply this to your individual fins/nanosheets, contacts, etc. and your various "DTCO" tricks end up degrading how the devices perform (of course assuming all else is equal). Parasitic cap linearly degrades your power and performance (power = (1/2)*f*C*V^2). Of course having every new process node be a regression would be unacceptable, so you see tons of innovation in materials and chemistry to not only claw the performance back, but even exceed the performance of the old node. People do things like increase channel strain, SiGe PMOS, shorten contacts, depopulate excess metal, add low-K spacers, increase fin drive current so you can do fin depopulation, etc.

Now with the context, yes, the leakage does indeed get worse as you scale. I don't remember the numbers, but I would not be shocked in the least if N3 has lower leakage than intel 7 even in spite of the N3 devices being smaller. Leakage reductions are not as large of a benefit to intel's products as they are to TSMC's most important customers (causing divergent priorities). There is also intel's Gox which comes with the penalty of channel control (see below). Finally, it isn't exactly a secret that an N3 transistor isn't really that much smaller than an intel 7 one. For example, the poly pitch has a 0.89 scale factor (which is worse than your historical full node shrink of 0.7x). When most of your density gain is coming from cell height reductions, the leakage problem slows down. My understanding is that the leakage getting worse on new nodes is more of a problem at lower power. For something like an intel desktop CPU the power is scaling by a power of 2 with voltage, linearly with freq, and additivity with leakage. Since that intel CPU would be using ultra low threshold voltages, the leakage was always going to be abysmal, be it on 2"nm" or 65"nm" (exaggerated for dramatic effect, but you get the idea). RC of interconnects is as you say an unavoidable problem, and intel 7 has some advantages here. Just like with parasitic cap, the interconnect RC will impact all speeds equally (assuming you don't have to worry about dielectric breakdown/arcing of the ILD which I am unsure of how reasonable this assumption is at ultra-high V). The thermal density thing is as you say a big issue. One that would presumably be much more of a limiter at higher V (on account of the power dissipation scaling the fastest with V).

That is expected behavior. The returns are often not as large as high voltages, but I can't really think of many (if any) instances where you saw large performance improvements at the low and mid range and a regression at the high range.

Major transistor architecture changes break performance. The performance then needs to be reengineered from scratch back into the process. I don't really have many good examples of things that broke performance during the SO2 -> HfO2 or planar -> finFET transitions. One I can think of is channel definition on a finFET. finFET has lower active area in the same footprint as planar (on account of the empty space between fins that are filled). The way you do the S/D and well implants also changed significantly. So if you want to your finFET to actually be better than you planar FET you needed to put in the engineering work to get a tall enough fin, with good enough profiles, properly done implants, and a well shaped S/D. Pat G said something once that I really liked the analogy with. He said something to the effect of how intel 3 is the end of the turbocharging era and how 18A is like the first EVs. It really is true though when you change device architectures it upends the paradigm. Things that worked no longer work and you need to figure out how to reimplement them. As an example, placing your gate first was a standard practice as it allowed you to self align the S/D to your gate and eliminate some nasty alignment related issues. With the adoption of HKMG this was no longer possible. In theory, you could have done the old gate last process to form your metal gate. The solution people actually use though is the well known replacement gate scheme to get the best of gate first (Self aligned S/D) and gate last (HfO2 being able to be deposited after all the high-T operations that would have damaged it are complete). Considering 2nd gen 22nm parts, and everything since then have surpassed the efficiency of sandy bridge, I would just blame that on process maturity or the 22nm performance vintage of Sandy Bridge failing to hit the final performance targets intel wanted in time for Sandy Bridge's launch.

Another major factor that I can't believe I forgot to mention for why the Vmaxs are different. Comparing across similar nodes, Intel always has thicker Gox than the thin gate devices for an equivalent TSMC process. Historically, when you look back at IEDMs from the 2000s/2010s (or even intel 4/3 versus intel 3-E) Intel's initial version of the process will have no thick gate devices. No thick gate means you can't natively do high voltage, which really restricts what you can do on the analog side of things. Although interestingly, you apparently don't "need" high voltage devices to do fancy analog stuff. I saw an interesting presentation from ISCC last year on the topic that if I am going to be honest half flew over my head. But either way you cut it, doing high voltage without thick gate will be needed going forward. TSMC mentioned at their 2024 symposia that N2 won't have thick gate oxide support and that they made a set of tools to allow easier porting of pre N2 analog IPs to N2. I have no clue how you would even begin to go about it while still having space for the gate metal between the nanosheets. With that tangent done, Intel would get around the initial process not having a thick Gox by making the default Gox unusually thick. This gave intel chip designers high enough voltages to make the PLLs, FIVIRs, clock trees, and interfaces they wanted for client CPUs. Then like clockwork, at the next IEDM you would see intel report about the "SOC" version of the last year's process with thick gate devices (and during Intel's mobile phone adventures you would see things like thin gate devices added to the SOC process for lowest standby power). The "SOC process" would then be used for various kinds of chipsets (PCH, mobo chipsets, WIFI, Bluetooth, sound, thunderbolt, security chips, etc.) and some of the ATOM phone/tablet SOCs.

But we were talking about frequency and voltage scaling on intel nodes vs TSMC nodes, not Analog Devices (pun very much intended). Even though it isn't the primary motivation, that thicker Gox allows for higher voltages. Because all the transistors have this thicker Gox, rather than only the analog devices that need thick gate having thick gates, your basic logic will (all else being equal) be able to opperate at a higher Vmax. The thicker Gox does however degrade your control of the channel, hurting switching and leakage. It also makes it harder to scale poly pitch.

What are you talking about? TSMC is terrible about disclosing anything interesting beyond the marketing fluff (vague V-F curve for some ARM core in a like for like comparison, a non-specific full chip density uplift, min metal pitch, poly pitch, and uHD SRAM bitcell or SRAM macro scaling depending on which looks better). Intel consistently gives detailed electricals. Always gives the dimensions for basic 4T NAND logic gates, all SRAM bitcell sizes, metal layer pitches/metallization schemes.

Intel seems to like going into detail about their processes at VLSI during the early summer.

Most likely never. We will get some low detail paper that will drop the pp and the min metal pitches with other charts that we have more or less seen already. Best case, we also get a couple of papers on some high speed SerDes on N2, but that is as good as we are likely to get from TSMC beyond what they have released to the public already (at least if every major TSMC node since like 10FF is anything to go off of).

TSMC has two main advantages for scaling SRAM. One is that their thinner Gox allows TSMC to more easily shrink their poly pitch. Two is that historically, TSMC processes have better leakage and lower Vmin than an equivalent intel process. This allows for smaller SRAM bitcells to maintain their data better. Also, N2 is around a year after 18A. You would sure hope N2 is better or TSMC would be behind for years even if you want to assume 14A never comes out. TSMC also seems to have better pehriphery/marcros than intel. Looking at recent history intel 7 trailing N7 sram density by like 15% desipite similar logic density, and intel 4/3 also having a 15% HD bitcell density disadvantage to N5/4 dispute the similar maxium logic densities.

There is also the elephant in the room. The old intel "7nm" was labeled as a N5 competitor and "5nm" was a N3 competitor. Per intel, intel 4/3 are the old "7nm" with various performance enhancments to narrow the gap with N3 as well as some foundry ecosystem enablement. This is reflected in reality with intel 3 having N5 HD logic density, behind on SRAM, far ahead on HP logic density, and suppior to N4 HPC power-performance. Logic would dicatate that back during the Bob Swan days 20/18A were originally intel "5nm" and were originally intented to be a N3 competitor rather than an N2 competitor.

You don't understand properly. 7nm doesn't NEED EUV. TSMC N7 has no EUV and is just fine. EUV is overly similistic scapegoat for i10nm problems. I have written about the topic at length before but I don't feel like repeating myself. So just dig around if you care. TLDR the probems were poor process deffinition even with the information intel had at the time and poor risk managment. Per Mark Phillips and prior SPIE papers 18A doesn't really have very many multipatterned layers. There is some tip to tip issues, but nothing resoultion related. So even if Intel brought in high-NA for 18A it wouldn't really do very much. Intel also showed off directional etch results which were pretty meh. But as the process matures maybe it makes sense to change the one or two multi pass layers MP mentioned to single pass with CD elongment.

What piece of information in the past 6mo makes you think 14A has slipped 12mo?

1. TSMC has never said that.
2. TSMC got their first high-NA tool earlier this year
3. When intel thinks it is ready they will insert it and when TSMC thinks it is ready they will insert it
4. The earliest TSMC can reasonably insert high-NA would be at A14 (as N2/A16 are a pair). I doubt it would ever make sense to rip out existing low-NA tools for layers that might be simplier with high-NA. Since high-NA wasn't avaliable for HVM 2-3 years ago it missed the insertion window for N2/A16.
4. Based on TSMC's statement of development time for new nodes increasing and still sticking to two process development teams, A14 will be launching products in that 2028/29 timeframe.
5. In this day and age TSMC ramps processes to a higher maxium capacity than intel at peak ramp. TSMC needs to secure more tools to adopt high-NA than intel does.

For the above reasons I am not overally concerned for TSMC. Maybe ASML is ready in time for intel to ramp the version of 14A with high-NA and intel gets "the win". If that does occur, yes that would be a nice thing for 14A wafer cost compared to N2/A16, but we aren't talking some coup that will cause N2 to utilization to collapse.

Awesome post.

Daniel Nenni · Jan 3, 2025

blueone said:
Awesome post.

Definitely, great post.

Intel can do HNA EUV chiplets and claim first to production. TSMC will have to do Apple SoCs which are much more complicated and very high volume. Let's hope the next Intel CEO is more grounded and sets HNA EUV expectations properly, especially when it comes to the foundry business.

According to recent news TSMC N2 is in small scale production with an estimated 5,000 wafers per month. Since the consensus is that Apple will not use TSMC N2 in the iPhone this fall, where are these wafers going? Any guesses?

nghanayem · Jan 3, 2025

Daniel Nenni said:
Definitely, great post.

Intel can do HNA EUV chiplets and claim first to production. TSMC will have to do Apple SoCs which are much more complicated and very high volume. Let's hope the next Intel CEO is more grounded and sets HNA EUV expectations properly, especially when it comes to the foundry business.

What expectation was incorrectly set? Intel said they want to use it with 14A with plans to do a proof of concept/derisk on 18A, and that they would only fully commit to it when when the maturity is sufficient for Intel to start use it for HVM.

Daniel Nenni said:
According to recent news TSMC N2 is in small scale production with an estimated 5,000 wafers per month. Since the consensus is that Apple will not use TSMC N2 in the iPhone this fall, where are these wafers going? Any guesses?

Process development and customer qualification vehicles (or in the case of early N2 customers A0 steppings for their lead products). 5K WSPM isnt even particularly large from the perspective of a pilot line, and certainly nowhere near enough for HVM for anything but Apple's lowest volume products.

Daniel Nenni · Jan 3, 2025

nghanayem said:
What expectation was incorrectly set? Intel said they want to use it with 14A with plans to do a proof of concept/derisk on 18A, and that they would only fully commit to it when when the maturity is sufficient for Intel to start use it for HVM.

I remember Intel being hailed as an EUV pioneer at the SPIE conferences in the 2010s. Intel received the first EUV system in 2013 and got EUV into production in 2023. Meanwhile TSMC and Samsung started EUV production in 2019. I guess the term pioneer does not mean was successful, just the first one to touch it?

HNA EUV is Déjà vu all over again for me. I believe it was in an investor call when Intel said they would have production HNA EUV wafers in 2027. The first ASML HNA EUV system arrived at Intel at the beginning of 2024 and production is in 2027? I guess it depends on what you mean by production. Does that include making a profit on HNA EUV wafers?

I'm not saying the EUV delays were all Intel's fault, ASML had a big role in at as well. What I am saying is that an IDM plays by different rules than a foundry. As a result, Intel views the semiconductor manufacturing world differently, especially under Pat Gelsinger. This works well when you are the undisputed technology leader. It does not work so well when you are not.

I think we can all agree that trust is an important part of the semiconductor industry and both Intel and Samsung have breached our trust. Setting expectations is the cornerstone of trust and that is something Intel needs to prioritize, my opinion.

Andy Grove: "Success breeds complacency. Complacency breeds failure. Only the paranoid survive." Paranoid companies know how to set and beat expectations. TSMC is a great example of this. They are not perfect but they are the most trusted semiconductor manufacturing company the world has ever seen.

Bottom line: TSMC has a healthy dose of paranoia embedded in their company culture. Intel and Samsung do not.

TSMC N2 specs improve, while Intel 18A gets worse

Banned

Well-known member

Active member

Active member

Well-known member

Active member

Well-known member

Well-known member

Moderator

Banned

Well-known member

Active member

Well-known member

Moderator

Moderator

Banned

Well-known member

Admin

Banned

Admin