Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/intel%E2%80%99s-18a-rumors-meet-a-thermal-brick-wall-says-semiwiki.24458/page-2
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2030970
            [XFI] => 1060170
        )

    [wordpress] => /var/www/html
)

Intel’s 18A rumors meet a thermal brick wall says SemiWiki

BSPD is an HPC technology for all the reasons Ian pointed out above and also because it costs more and requires new design techniques. The mobile guys do not want BSPD, they don't want the cost, it doesn't align with their needs and they don't know how to design for it.. TSMC has 2nm without BSPD, then A16 with BSPD (that I hear will be pretty much an NVIDIA node), then A14 without BSPD and then an A14 follow on with BSPD. The mobile guys will use 2nm and A14 and the HPC guys will use A16 and the A14 follow on process. I have heard Intel may offer a 14A version without BSPD, if so that would make them a mobile option once it comes out.
When you first mentioned this a while ago, it was very eye opening and explained a lot of what i was hearing on 18A. Thanks for the inputs
 
BSPD is an HPC technology for all the reasons Ian pointed out above and also because it costs more and requires new design techniques. The mobile guys do not want BSPD, they don't want the cost, it doesn't align with their needs and they don't know how to design for it.. TSMC has 2nm without BSPD, then A16 with BSPD (that I hear will be pretty much an NVIDIA node), then A14 without BSPD and then an A14 follow on with BSPD. The mobile guys will use 2nm and A14 and the HPC guys will use A16 and the A14 follow on process. I have heard Intel may offer a 14A version without BSPD, if so that would make them a mobile option once it comes out.
Agreed 100%, it's exactly what I've been saying. The problem for Intel is that quite apart from the fact that they've publicly nailed their colors to the BSPD mast, if they did a FSPD variant of 14A it would be late to market compared to TSMC, have much poorer IP support, and be more expensive due to generally higher wafer costs and lower yield at Intel -- especially since TSMC will be *at least* a year further down the yield curve at any point in time.

Given that TTM and KGD cost and IP support are probably the three most critical things for mobile and Intel would be at a disadvantage in all of these, it's difficult to see how they could be successful. Which given the existing uncertainly about how successful Intel will be in the foundry market even with their BSPD advantage (and whether they'll carry on investing enough to make this happen big-time), you'd think that would make it *very* difficult for Intel management to make the business case for doing 14A FSPD -- plus having to find the extra resources to develop/qualify a quite different process.

If Intel were going to do FSPD they should have already done the process development/qualification in parallel with BSPD 14A, but they didn't so they've missed the boat -- at least, for this node.
 
Not very scientific, but I found a couple of reviews using a common laptop chassis to try to compare 18A and N3B thermal performance (Panther Lake and Arrow Lake-H) respectively. Both tests used Cinebench in a loop to determine what power level the CPU could stay at, for a given fan profile. Note the Cinebench performance numbers can't be directly compared as they used different versions, but the drop in "best run" vs "10 minutes of heat soaked" may be useful.

Power at Temp (measured by the CPU software) with fans set to performance:
- Arrow Lake-H can sustain about 35W ("high -80's C")
- Panther Lake can sustain 30W (77C) or 44W (92C) with fans set to performance with keyboard attached

Power/Temp in silent/whisper mode:
- Arrow-Lake H "down to about 20W at low 70's C"
- Panther Lake 20W at 67C

Performance - best run vs "10 minutes heat soaked" - Cinebench multithreaded, fans in performance mode.
- Arrow Lake-H went from 17348 to 15627 for CB 2023, a drop of about 10%
- Panther Lake went from 1142 to 1103 for CB 2024, a drop of 4%.

Source 1: Panther Lake Asus Zenbook Duo: https://www.techpowerup.com/review/asus-zenbook-duo-ux8407/10.html
Source 2: Arrow Lake-H Asus Zenbook Duo: https://www.ultrabookreview.com/70717-asus-zenbook-duo-review-2025/

Take-aways: Using a similar (same?) laptop chassis, Panther Lake is losing less performance when going from 'first run' to 'running for 10 minutes' in a heat soak benchmark. Power @ Temp also appears to be about equal for both chips, indicating the thermal resistance of an 18A chip might not be significantly worse than a N3B variant.

Full caveats that these are different architectures, and Intel's Panther Lake is a HUGE improvement in efficiency vs. prior Intel and current AMD offerings.

P.S. I think Arrow Lake-H is a bit of a better foil as Lunar Lake just has too few cores for an 'even' comparison in thermals and performance. Also, without seeing teardowns - ASUS could have changed it's cooling solution between the Arrow Lake-H and Panther Lake laptops, so this is definitely not a 'great' comparison.
 
Not very scientific, but I found a couple of reviews using a common laptop chassis to try to compare 18A and N3B thermal performance (Panther Lake and Arrow Lake-H) respectively. Both tests used Cinebench in a loop to determine what power level the CPU could stay at, for a given fan profile. Note the Cinebench performance numbers can't be directly compared as they used different versions, but the drop in "best run" vs "10 minutes of heat soaked" may be useful.

Power at Temp (measured by the CPU software) with fans set to performance:
- Arrow Lake-H can sustain about 35W ("high -80's C")
- Panther Lake can sustain 30W (77C) or 44W (92C) with fans set to performance with keyboard attached

Power/Temp in silent/whisper mode:
- Arrow-Lake H "down to about 20W at low 70's C"
- Panther Lake 20W at 67C

Performance - best run vs "10 minutes heat soaked" - Cinebench multithreaded, fans in performance mode.
- Arrow Lake-H went from 17348 to 15627 for CB 2023, a drop of about 10%
- Panther Lake went from 1142 to 1103 for CB 2024, a drop of 4%.

Source 1: Panther Lake Asus Zenbook Duo: https://www.techpowerup.com/review/asus-zenbook-duo-ux8407/10.html
Source 2: Arrow Lake-H Asus Zenbook Duo: https://www.ultrabookreview.com/70717-asus-zenbook-duo-review-2025/

Take-aways: Using a similar (same?) laptop chassis, Panther Lake is losing less performance when going from 'first run' to 'running for 10 minutes' in a heat soak benchmark. Power @ Temp also appears to be about equal for both chips, indicating the thermal resistance of an 18A chip might not be significantly worse than a N3B variant.

Full caveats that these are different architectures, and Intel's Panther Lake is a HUGE improvement in efficiency vs. prior Intel and current AMD offerings.

P.S. I think Arrow Lake-H is a bit of a better foil as Lunar Lake just has too few cores for an 'even' comparison in thermals and performance. Also, without seeing teardowns - ASUS could have changed it's cooling solution between the Arrow Lake-H and Panther Lake laptops, so this is definitely not a 'great' comparison.
The problem here is there are too many variables (different chips for different generations) -- and this includes the hidden one of where the temperature sensor on-chip is placed. These measure the sensor which is invariably embedded in the substrate, and not the hottest bits of the circuit which are the high-power-density transistors and the metal attached to them -- I know, we use sensors like this and have compared what they read compared to the critical temperatures (using thermal simulations), and they don't track well -- the difference is bigger in N2 (nanosheet) then N3 (FinFET), and a lot bigger still with BSPD.

The overall thermal resistance from chip to heatsink/outside world is not the biggest problem with BSPD, temperature differences across the die are.

Here are some example hotspot calculations for different processes, at a power density of 100W/mm2 -- which sounds ludicrously high because it would be for the whole die, but it's equivalent to 10mW power dissipation in a 10um x 10um circuit (or 0.1mW in 1um x 1um) which is not unusual for small high-speed circuits like clock drivers (actually we've seen even higher numbers inside very high-speed circuits like SERDES). This particular case shows that for "typical" circuit sizes at this power density, BSPD runs about 20C hotter than FSPD.

Note that this based on a particular set of assumptions and is *not* a general case which applies to all chips, but it's also not an uncommon one in high-performance devices. Also note that this does not include the very local self-heating down at the gate stripe level (<0.1um) and in elevated fin/nanosheet devices (can also be up to 20C or so) which is on top of this...

hotspots.png
 
Last edited:
The problem here is there are too many variables (different chips for different generations) -- and this includes the hidden one of where the temperature sensor on-chip is placed. These measure the sensor which is invariably embedded in the substrate, and not the hottest bits of the circuit which are the high-power-density transistors and the metal attached to them -- I know, we use sensors like this and have compared what they read compared to the critical temperatures (using thermal simulations), and they don't track well -- the difference is bigger in N2 (nanosheet) then N3 (FinFET), and a lot bigger still with BSPD.

A few questions for my education -

How accurate are typical on-die temperature sensors? (just curious how much this affects variability, too)

Do modern CPUs / SoCs typically throttle portions of the chip (i.e. temperature sensors all over driving thermal decisions), or do they typically still lean towards looking at the hottest spot(s) and slowing down the entire chip accordingly?

..

The piece I found most interesting in comparison was that the CPUs still appeared to be capable of sustaining roughly the same overall wattage level in roughly the same form factor (i.e. laptops should be very similar - cooling and chassis). I definitely appreciate local hotspots can differ signficantly, and I also did not look up or include die sizes for reference which is yet another variable that makes the comparison less accurate.

The density of power for local hotspots in your writeup is very interesting (denser in places than I would have expected) - thanks for that! I'm curious if GAAFET helps transmit heat between transistor/areas better than FINFET and how that compares to Planar. Is that sort of implied in the data there, or too many variables because of top / bottom cooling and insulation?
 
A few questions for my education -

How accurate are typical on-die temperature sensors? (just curious how much this affects variability, too)

Do modern CPUs / SoCs typically throttle portions of the chip (i.e. temperature sensors all over driving thermal decisions), or do they typically still lean towards looking at the hottest spot(s) and slowing down the entire chip accordingly?

..

The piece I found most interesting in comparison was that the CPUs still appeared to be capable of sustaining roughly the same overall wattage level in roughly the same form factor (i.e. laptops should be very similar - cooling and chassis). I definitely appreciate local hotspots can differ signficantly, and I also did not look up or include die sizes for reference which is yet another variable that makes the comparison less accurate.

The density of power for local hotspots in your writeup is very interesting (denser in places than I would have expected) - thanks for that! I'm curious if GAAFET helps transmit heat between transistor/areas better than FINFET and how that compares to Planar. Is that sort of implied in the data there, or too many variables because of top / bottom cooling and insulation?
On die-sensors can be pretty accurate (a degree or two) especially if calibrated at production (some are) -- but they only measure the temperature where they are... :-(

Modern CPUs have lots of sensors all over the place, and in many cases can do individual throttling/dynamic supply voltage control for different blocks, but this completely depends on the design -- in a lot of cases two blocks which communicate have to run at the same clock rate. ASICs which just continuously stream data at a fixed rate (like comms devices) can't use this trick, they have to run at a fixed rate/voltage.

The plots I showed are for circuit/block level self-heating which are similar for all FSPD processes (but BSPD is a lot worse). On top of this you have device-level SHE which is within individual transistors, especially if made out of multiple stripes as many are -- for example, in this case the middle gates tend to run hotter than the end ones. You can reduce this effect by splitting the transistors up into paired gates with dummies in between or even individual gates, but this increases area and parasitic capacitance (meaning, power consumption) so it's not a free lunch. BSPD has the additional problem that lateral heat-spreading away from hot transistors is also not as good as FSPD, so these gate-level temperature differences are also bigger.

GAAFET has worse local hotspots (transistor level) then FinFET because the thermal connection to the substrate is poorer. FinFETs are a bit better but still considerably worse than planar devices due to the tall thin fins, especially if the PMOS uses SiGe fins which have lower thermal conductivity than silicon, which is a nasty surprise if you're not expecting it -- we saw PMOS SHE that was about twice as big as NMOS. GAAFET are worse still, especially NMOS (the gap to PMOS is smaller since there are no SiGe fins).

The power densities I quoted are not unusual for high-speed circuits, or heavily-loaded ones like clock drivers -- in the worst cases we've seen even higher numbers, to the point where there was no choice other than to divide the circuit up into multiple parallel copies and spread these out, even though this was undesirable (high-speed VCO)... :-(

Of course this doesn't apply to *all* chips, but the TSMC recommendation for A16 is telling: "Suitable for HPC devices with dense power grids and active cooling"... ;-)
 
Last edited:
Thanks @IanD, this also explains a lot about the hurdles Intel faces in Foundry vs. also serving it's own needs..

It seems a little bleak that many of the new technologies (new transistor types, BSPD), combined with lack of SRAM scaling lately seem to be decreasing the reusability of newer nodes for multiple product segments.. also reducing the cost benefit of (any remaining) scaling.

..

FWIW, the 100W/mm2 you're working with is pretty far into "insane engineering territory". Below is W/cm2.. the very top of the chart is what you described :).

1771525037751.png
 
Agreed 100%, it's exactly what I've been saying. The problem for Intel is that quite apart from the fact that they've publicly nailed their colors to the BSPD mast, if they did a FSPD variant of 14A it would be late to market compared to TSMC, have much poorer IP support, and be more expensive due to generally higher wafer costs and lower yield at Intel -- especially since TSMC will be *at least* a year further down the yield curve at any point in time.
The the converse should apply to TSMC BSPD. They are a year behind Intel at 18A and will be a year behind 14A unless I misread the timeline. That would imply that once Intel gets the whole how to do foundry thing sorted out they should have the inside track on BSPD.

I also think you are too focused on early adopters. Foundry processes run for 10+ years. 6-7 years from now who is to say that Intel FSPD won't be a viable alternative for someone looking to upgrade from an older node. Not every device out there needs to run on the latest and greatest Silicon. In fact a fairly large percentage of TSMC revenue comes from their legacy foundry processes.
 
The the converse should apply to TSMC BSPD. They are a year behind Intel at 18A and will be a year behind 14A unless I misread the timeline. That would imply that once Intel gets the whole how to do foundry thing sorted out they should have the inside track on BSPD.

I also think you are too focused on early adopters. Foundry processes run for 10+ years. 6-7 years from now who is to say that Intel FSPD won't be a viable alternative for someone looking to upgrade from an older node. Not every device out there needs to run on the latest and greatest Silicon. In fact a fairly large percentage of TSMC revenue comes from their legacy foundry processes.
I'm not convinced that TSMC A16 will be any later than Intel 14A in reality as opposed to on Powerpoint presentations, especially given recent timescale announcements by Intel. And the difference is that TSMC have huge customers (including Nvidia) locked into and committed to A16, indeed it's been developed to match their requirements because that's how TSMC works.

Being a follower node means less revenue and lower yield for the foundry, and the simple problem for Intel is -- why should anybody choose to use their proposed (not committed, not "almost-ready") FSPD process? I think there's zero doubt that TSMC FSPD will be cheaper and better yielding (partly because it'll be in *way* higher volume, and pushing wafers through is what gets D0 down), and being available earlier means it'll always be further down the D0 curve, and the huge TSMC ecosystem means that much more IP is and will be available for it.

The only reason I could see for customers going with Intel FSPD is either they can't get a slot with TSMC (unlikely) or that they really *really* want US onshore manufacturing for strategic/security of supply reasons, and are willing to compromise on everything else for this. I can't see there being enough customers like this to justify the process, especially when the same fabs can make the more lucrative and more in demand (because it's pretty good!) BSPD process -- a few DoD/other strategic customers simply won't provide enough volume, no matter how much Trump demands that Intel support such customers (and also they'll probably prefer BSPD...).

A large (but actually falling...) percentage of TSMC revenue *does* come from their legacy processes -- but they only got to be that way by being successful bleeding-edge processes when they were introduced and pulling in lots of customers then and since, which is not going to work for Intel with 14A FSPD... :-(
 
Thanks @IanD, this also explains a lot about the hurdles Intel faces in Foundry vs. also serving it's own needs..

It seems a little bleak that many of the new technologies (new transistor types, BSPD), combined with lack of SRAM scaling lately seem to be decreasing the reusability of newer nodes for multiple product segments.. also reducing the cost benefit of (any remaining) scaling.

..

FWIW, the 100W/mm2 you're working with is pretty far into "insane engineering territory". Below is W/cm2.. the very top of the chart is what you described :).

View attachment 4246
I know, as I said this is *not* the average power across the chip which is much lower, but the maximum power density in the hottest bits which are a small fraction of the total area -- and you have to zoom right in to the small scale to see them, we're not even talking large circuit blocks visible on a floorplan.

IIRC large high-power chips like leading-edge GPU/NPU are currently running around 2W/mm2 *average* power (~1400W for a ~700mm2 reticle-sized chip), because most gates on a chip do nothing most of the time even on devices designed to minimise idle time, that's the nature of chips. But on these chips there will be some small blocks with higher activity/power density, and inside these blocks there are smaller circuit areas with higher power density still -- especially in essential functions like high-speed SERDES (vital for GPUs) or memory interfaces. And inside these hot circuit blocks are even hotter transistors doing things like driving high-speed clocks -- and then inside these transistors some individual gates are hotter still.

100W/mm2 does sound insane, but if you rewrite it as 100uW/um2 it isn't -- in modern processes it's easy to dissipate this much power in a loaded CMOS gate (or high-speed analog circuit) if it's clocked really fast and running at highish voltage (to get the speed). So yes, this is circuit-dependent, but more and more devices nowadays include such circuits.

This is all heirarchical, the closer you zoom in the worse self-heating you find because all these factors multiply up -- and that's speaking as someone intimately involved with designs like this, right down to the gate stripe level. Even at average chip power levels rather lower than the 2W/mm2 noted above -- because efficiency in pJ/b matters more, and you don't get this by running digital at high voltage and high clock rates -- we regularly see peak power density and SHE at the levels I mentioned, in fact we've even seen rather higher ones which meant the design/layout (full-custom IP) had to be changed specifically to reduce SHE... :-(

And it's all getting gradually worse and worse as processes evolve, because power/current density per mm2 is gradually increasing as more funtionality clocked faster is squeezed into a smaller and smaller area, FinFET had worse SHE than planar, GAA is worse again -- and BSPD throws more fuel onto the fire since this increases both density and clock speeds, as well as having the extra thermal resistance to the heatsink and poorer heat-spreading which makes hotspots worse.

It's one of the dirty little secrets that gets swept under the rug, and is increasingly one of the big design problems -- especially for chips which run flat/out 24/7 for years, which of course may exclude a lot of designs including many CPUs...
 
Last edited:
Back
Top