Updating our current logic density benchmarking methodologies

nghanayem · Feb 14, 2024

As many on this forum are aware, maximum theoretical logic density is often calculated by taking the (M2 pitch) X (M2 tracks for a four transistor NAND gate) X (CPP). From there we try to use correction factors to account any boundary scaling (for example Scotten using 10% area reduction from going to SDB from DDB, as xDB doesn't show up on the level of individual cells) and uHD SRAM bit density.

With the advent of mixed row logic libraries, I think it best if we decide what methodology we want to use for measuring the cell height of mixed row libraries.

Using N3B as our example case (since to my knowledge that is the first node to ever try mixed row); we could either do a geo mean of the CHs for the 2-1 lib or use the CH of the 2 fin lib. I do not think using the CH for the 1 fin device makes any sense since it only has like 3 M0 tracks and is incapable of making logic circuits without siphoning off wiring resources from nearby 2 fin devices. Conversely we could only use the 2 fin device, with the caveat that you can go denser if you use "finFLEX" (in a similar manner to the correction factors applied for SDB). However I feel that it would be easier to go with the geo mean method as we won't need to muck around trying to compute what a good correction factor for different nodes and or products would be.

Any and all additional thoughts and discussions are welcome!

KevinK · Feb 14, 2024

Noble effort, but I would question the value of trying to refine the theoretical max when the real density can vary so widely based on the entire process/design rule/IP/EDA tool/methodology/design style stack. All of those are heavily interrelated, and unless they are all dialed in (optimized) together, a user is going to end up very far from their theoretical entitlement. Kind of like max MIPs or FLOPS ratings for a processor - not particularly useful except for specsmanship. Most people use real code benchmarks today for everything from video encoding to machine learning (MLPerf). One of the things that got Intel into trouble is that their leading edge processes had good theoretical density, but performed poorly on anything other than their entire stack. Guess why they go outside today for GPU-oriented design ? I'm guessing that's going to change as part of Pat's new strategy.

nghanayem · Feb 14, 2024

KevinK said:
Noble effort, but I would question the value of trying to refine the theoretical max when the real density can vary so widely based on the entire process/design rule/IP/EDA tool/methodology/design style stack. All of those are heavily interrelated, and unless they are all dialed in (optimized) together, a user is going to end up very far from their theoretical entitlement.

A fair question. As you point out the chip density often trails theoretical density, but as you also point out this is a design decision. There is nothing physically stopping an N7 chip from hitting the 9x MT/mm^2. Of course an array of HD cells or SRAM is not useful. Two blocks might not be able to sit next to each other because the metal routing for the two blocks would conflict. While this is relevant for chip designers, this isn't as relevant when comparing the processes themselves. As professionals on a public forum this is as close as we could get to an apples to apples comparison of chip density without having something like a die shrink or a dual sourcing situation. N7 and 7LPP have extremely similar theoretical densities. The slight pitch and or layer count differences across the metal stack might slightly impact final chip die size, but this shouldn't be substantial. Now conversely say you wanted to make a chip on N7 with only the HD cells or N5 with only the HP cells. Theoretically they should match up pretty similarly, but the larger wires on N7 might lead to die sizes being a good bit bigger (even if we ignore the denser SRAM/analog on N5).

KevinK said:
One of the things that got Intel into trouble is that their leading edge processes had good theoretical density, but performed poorly on anything other than their entire stack.

This was always something that baffled me to see. The following isn't something I saw you say, but is an extreme example of things I saw while scraping old process tech articles. Apple's Samsung 28LP SOCs had a higher total Xtrors/die size than intel's 22nm chips, and that was because "intel's yields were terrible they had to relax the density". As a side note I have seen a similar things with 10nm SF, TSMC N7, and I remember seeing SA write something of an N5 hit piece about how "TSMC lied about their density and had to relax it". Back to 28LP; any fab side person could and would tell you Samsung 28 LP wasn't in the same zip code as intel 22nm when it came to density, so what gives? Intel was doing a high freq design is what happened. I can't say back then, but I know that at least with their N7 and N5 SOCs apple never used anything besides TSMC's shortest library. Meanwhile the intel short libraries were mostly (if not entirely?) relegated to the iGPU. That isn't some "fatal flaw of 22nm", that is because Apple and intel had different design points.

Another couple of sources I can think of would be that in a higher freq design, blocks are probably placed in order to minimize distance between other parts of the circuit rather than putting extra emphasis on minimizing die size. Finally more spaced transistors would lower heat density and should allow for greater Fmax. Great examples of what I am talking about can be found in many of the big cores in mobile SOCs. While Apple has two separate core types, my understanding is that QCOM uses the same core but with a freq tuned RTL rather than trying to minimize Cdyn and or maximize density like their little cores. AMD is doing the opposite with a RTL tweaked to lower die size to it's minimum.

KevinK said:
Guess why they go outside today for GPU-oriented design ? I'm guessing that's going to change as part of Pat's new strategy.

If memory serves from the old announcements, graphics was originally supposed to be the lead product for intel 7"nm" before the 2020 announcement of a 1yr delay and the subsequent announcement that intel would start making products external. Graphics on external looks like the understandable product of a complete lack of confidence in intel process technology during the BS era.

IanD · Feb 15, 2024

The hybrid libraries in N3 make things really interesting, from the benchmarks we've seen they decrease power and area but don't increase speed for obvious reasons (H169 is 2 fin, M143 is mixed 1-fin/2-fin).

Scotten Jones · Feb 15, 2024

nghanayem said:
As many on this forum are aware, maximum theoretical logic density is often calculated by taking the (M2 pitch) X (M2 tracks for a four transistor NAND gate) X (CPP). From there we try to use correction factors to account any boundary scaling (for example Scotten using 10% area reduction from going to SDB from DDB, as xDB doesn't show up on the level of individual cells) and uHD SRAM bit density.

With the advent of mixed row logic libraries, I think it best if we decide what methodology we want to use for measuring the cell height of mixed row libraries.

Using N3B as our example case (since to my knowledge that is the first node to ever try mixed row); we could either do a geo mean of the CHs for the 2-1 lib or use the CH of the 2 fin lib. I do not think using the CH for the 1 fin device makes any sense since it only has like 3 M0 tracks and is incapable of making logic circuits without siphoning off wiring resources from nearby 2 fin devices. Conversely we could only use the 2 fin device, with the caveat that you can go denser if you use "finFLEX" (in a similar manner to the correction factors applied for SDB). However I feel that it would be easier to go with the geo mean method as we won't need to muck around trying to compute what a good correction factor for different nodes and or products would be.

Any and all additional thoughts and discussions are welcome!

"As many on this forum are aware, maximum theoretical logic density is often calculated by taking the (M2 pitch) X (M2 tracks for a four transistor NAND gate) X (CPP). From there we try to use correction factors to account any boundary scaling (for example Scotten using 10% area reduction from going to SDB from DDB, as xDB doesn't show up on the level of individual cells) and uHD SRAM bit density."

I don't use a 10% area reduction for SDB vs DDB and the whole description you have here isn't how I calculate density. I think I published my method a few years ago, I will post the link if I can find it.

KevinK · Feb 15, 2024

nghanayem said:
Graphics on external looks like the understandable product of a complete lack of confidence in intel process technology during the BS era.

My take is that people focus way too closely on process technology alone. From what I saw, the big issue was that the Intel-evolved process/design rule/IP/EDA tool/methodology/design style stack entirely was unsuitable for dense GPUs which really needed a very different "everything" vs. the x86 performance CPUs Intel was cranking out (as you also allude to). But the stack is very difficult to change because the entire stack requires design technology co-optimization for the design points and design styles targeted by that process.

Your methodology only gives a very rough view of what might be achievable plus relative comparisons between processes. But those comparisons can be far off from post layout PPA comparisons done with the whole stack.

nghanayem · Feb 15, 2024

Scotten Jones said:
"As many on this forum are aware, maximum theoretical logic density is often calculated by taking the (M2 pitch) X (M2 tracks for a four transistor NAND gate) X (CPP). From there we try to use correction factors to account any boundary scaling (for example Scotten using 10% area reduction from going to SDB from DDB, as xDB doesn't show up on the level of individual cells) and uHD SRAM bit density."

I don't use a 10% area reduction for SDB vs DDB and the whole description you have here isn't how I calculate density. I think I published my method a few years ago, I will post the link if I can find it.

Sorry for butchering the memory of your methodologies

IanD said:
The hybrid libraries in N3 make things really interesting, from the benchmarks we've seen they decrease power and area but don't increase speed for obvious reasons (H169 is 2 fin, M143 is mixed 1-fin/2-fin).

View attachment 1684

While the freq bump looks about right, I must say I am surprised how little the dynamic and leakage power improved for the 2-2 config over it's N4P competition doubly so since this is N3E not N3B. The 2-1 power results at least look pretty good while also providing a small freq bump over N4P 2 fin.

KevinK said:
My take is that people focus way too closely on process technology alone.

I do work on the process side after all, so I think you can guess where my biases lie

. Besides that it also does matter because it is a tide that raises all boats. Whereas NVIDIA having a good GPU design is something that only raises the boats of the products that use that die.

KevinK said:
From what I saw, the big issue was that the Intel-evolved process/design rule/IP/EDA tool/methodology/design style stack entirely was unsuitable for dense GPUs which really needed a very different "everything" vs. the x86 performance CPUs Intel was cranking out (as you also allude to).

I can't really speak for the design side, but my point is there is nothing physically stopping you from making a good GPU on say 14nm. TSMC nodes aren't magically denser, and using tall cells and lower std cell utilization is a choice not a limitation. Intel has made GPUs on their process for something approaching 2 decades. They've made cost optimized, chipsets, and handheld CPUs. So if intel technology was chronically bad for HD designs, wouldn't intel have made these parts separate from the beginning, rather than only switching once they gave up hope of ever having a process lead? I guess it isn't something I can give any proof for, but if NVIDIA ported Pascal to intel 14nm, I have no reason to doubt that it would be smaller, or that they could achieve a higher chip density than what is found within a skylake core or by taking total Xtor count/die size.

KevinK said:
But the stack is very difficult to change because the entire stack requires design technology co-optimization for the design points and design styles targeted by that process.

Your methodology only gives a very rough view of what might be achievable plus relative comparisons between processes. But those comparisons can be far off from post layout PPA comparisons done with the whole stack.

I don't disagree, and I don't claim it as an end all be all. A phrase I have heard and love is that "silicon talks and silicon never lies". But analyzing a process from a finished chip requires a lot of testing, specialized tools, and expertise. I mostly view looking at the theoreticals as an imperfect shorthand. I suppose it is also nice for designers to know what is theoretically possible so they can better understand the tradeoffs they are making and what a perfectly routed design could look like.

And Kevin, I can't help shake the feeling that I am not totally understanding your point. So if this response doesn't really get to the crux of your issue or is kind of circular let me know, because I do want to understand.

IanD · Feb 15, 2024

KevinK said:
My take is that people focus way too closely on process technology alone. From what I saw, the big issue was that the Intel-evolved process/design rule/IP/EDA tool/methodology/design style stack entirely was unsuitable for dense GPUs which really needed a very different "everything" vs. the x86 performance CPUs Intel was cranking out (as you also allude to). But the stack is very difficult to change because the entire stack requires design technology co-optimization for the design points and design styles targeted by that process.

Your methodology only gives a very rough view of what might be achievable plus relative comparisons between processes. But those comparisons can be far off from post layout PPA comparisons done with the whole stack.

The power comparisons I gave were for a complete CPU core, not low-level gates. Here's the area benchmark for the same case:

IanD · Feb 15, 2024

nghanayem said:
Sorry for butchering the memory of your methodologies

While the freq bump looks about right, I must say I am surprised how little the dynamic and leakage power improved for the 2-2 config over it's N4P competition doubly so since this is N3E not N3B. The 2-1 power results at least look pretty good while also providing a small freq bump over N4P 2 fin.

I do work on the process side after all, so I think you can guess where my biases lie . Besides that it also does matter because it is a tide that raises all boats. Whereas NVIDIA having a good GPU design is something that only raises the boats of the products that use that die.

I can't really speak for the design side, but my point is there is nothing physically stopping you from making a good GPU on say 14nm. TSMC nodes aren't magically denser, and using tall cells and lower std cell utilization is a choice not a limitation. Intel has made GPUs on their process for something approaching 2 decades. They've made cost optimized, chipsets, and handheld CPUs. So if intel technology was chronically bad for HD designs, wouldn't intel have made these parts separate from the beginning, rather than only switching once they gave up hope of ever having a process lead? I guess it isn't something I can give any proof for, but if NVIDIA ported Pascal to intel 14nm, I have no reason to doubt that it would be smaller, or that they could achieve a higher chip density than what is found within a skylake core or by taking total Xtor count/die size.

I don't disagree, and I don't claim it as an end all be all. A phrase I have heard and love is that "silicon talks and silicon never lies". But analyzing a process from a finished chip requires a lot of testing, specialized tools, and expertise. I mostly view looking at the theoreticals as an imperfect shorthand. I suppose it is also nice for designers to know what is theoretically possible so they can better understand the tradeoffs they are making and what a perfectly routed design could look like.

And Kevin, I can't help shake the feeling that I am not totally understanding your point. So if this response doesn't really get to the crux of your issue or is kind of circular let me know, because I do want to understand.

I think this is showing how little the improvement from raw process is nowadays, which is not that surprising when you look at how small the layout differences/pitches are between N5/N4 and N3. There's quite a big density (area) improvement (not so much from just pitches, also from DTCO) and the hybrid library only gives a little extra, but power is the other way round, most of the improvement comes from the hybrid library and other DTCO improvements.

Which means if you compare "N5/N4" with "N3" power and area are both considerably improved, but the power saving mostly comes from DTCO i.e. the hybrid library. Which is only available in N3 and there's no reason not to use it, but you could say is not being entirely transparent about where the PPA improvement is coming from -- though the other point of view (TSMC's) is presumably that "process technology" nowadays means [raw process + DTCO libraries] and N3 is delivering as promised... ;-)

KevinK · Feb 16, 2024

nghanayem said:
And Kevin, I can't help shake the feeling that I am not totally understanding your point. So if this response doesn't really get to the crux of your issue or is kind of circular let me know, because I do want to understand.

I think Ian’s PPA graphs highlight my point for leading edge logic processes. Raw dense silicon and metal geometries, without rigorous PPA analysis and DCTO across the entire stack, can‘t deliver the same capabilities vs a foundry that does the work. I’ve seen a 2x difference in density between HD processes with the same rough geometries, due to lack of DCTO. Serious foundries will have these graphs ready against standard RTL designs for prospects, to supplement and often supplant the raw process geometry comparisons.

nghanayem · Feb 16, 2024

KevinK said:
I think Ian’s PPA graphs highlight my point for leading edge logic processes. Raw dense silicon and metal geometries, without rigorous PPA analysis and DCTO across the entire stack, can‘t deliver the same capabilities vs a foundry that does the work. I’ve seen a 2x difference in density between HD processes with the same rough geometries, due to lack of DCTO. Serious foundries will have these graphs ready against standard RTL designs for prospects, to supplement and often supplant the raw process geometry comparisons.

I agree with you. I guess my point was, as you say. DTCO transcends to a zone of not quite process and not quite design. Bad DTCO doesn't mean bad process, but bad design/fab team-design team collaboration. Now if intel is bad at DTCO, I don't know. If they were bad at DTCO I assume that would mean that all designs are I don't know 20% larger than with good DTCO at iso freq-power targets, rather than this just impacting GPUs? Or maybe you mean that intel was having a hard time getting their nodes to get the area reductions you would expect if you set your design point to a lower freq where you would normally expect to see bigger area reductions?

Either way back to theoretical density, to steal a metaphor from the CPU side of things. The value of a CPU is the speed in which it runs programs. Nobody cares about which CPU has the higher SPEC-INT score. However I assume SPEC-INT is common thing CPU designers think about when they are making a new u-arch? I view these on paper process details in a similar manner. Like spec-int nobody will buy a wafer for the on-paper PPA characteristics (they buy for the PPAC on their design on real testchips they are optimized for the node). But this data does serve as one data point for process people to use for comparing nodes in something of a sterile environment. I won't argue that it is a worse method than taking one ARM core IP and making similar plots to Ian. But getting that data across nodes is kind of hard, and when manufactures do release that data, it is almost always data light.

Xebec · Feb 17, 2024

IanD said:
The hybrid libraries in N3 make things really interesting, from the benchmarks we've seen they decrease power and area but don't increase speed for obvious reasons (H169 is 2 fin, M143 is mixed 1-fin/2-fin).

View attachment 1684

Are the 'C' metrics referring to degrees Celsius? Curious why it would mix 125C Y axis with -40C X axis?

Tanj · Feb 17, 2024

There is a lot written about the transistors, but the wiring is at least equally important. As the EDA has difficulty wiring it becomes harder to keep gates closer, the drive has to go through more vias to find higher layers, more buffers get added, etc. I do wonder how much performance is left on the table by difficulties in layout algorithms. And how much do we need density vs. low power, are they necessarily the same thing? For example if there are 2 fins in the same track height as a 3 fin cell, so that wiring density is flexible but gate capacitance (and drive) is 2/3rds, would the layout be better than if the track height is squashed and we lose wiring tracks as well as fins? If wiring is harder then those 2 fins don't fan out as well, maybe the bigger cell actually would have fewer Joules per unit of work and better battery life, while running just as fast for anything the user needs.

It is interesting that transistor scaling has serious limits in physics due to leakage, channel length minimum, and doping, while wiring is nowhere near its limits except for lithography. Sure, wiring has issues of increasing resistance but it does not simply fail to work if it gets, say, 2x as small whereas even with 2D materials we are not seeing channel length cut in half. And even when we do advance transistors some things do not improve - last I saw, ribbons do not have lower capacitance than fins. Yet we do not see a lot written about advanced processes which highlight how much the wiring might improve independently of the transistors. Seems like there is potential for advances there.

I think Intel's backside power is an example of advances coming from the wiring. Very promising, not just reducing crowding and stacking on the signal wiring but also likely to reduce droop and support lower voltage operation. It will be interesting to watch this come into full production and see if it really works as well as it pencils out. Aspects like thermal management, stress on the super-thin Si remnant layer, and long term reliability or tolerance of wide temperature ranges will be things to look at.

Tanj · Feb 17, 2024

I do think the coming era is one of searching for lower energy per function, not simply cramming elements into smaller size. An H100 GPU rarely runs at full clock because it generates too much heat. The AI revolution seems constrained by heat, not size. It is an echo of the heat death of Dennard scaling.

There are other practical limits around memory, too. The simply is no memory solution to moving LLMs to mobile at reasonable performance and energy per operation. SRAM is two orders of magnitude too low capacity/dollar to hold the models, and DRAM is 2 orders magnitude too much energy per bit to transfer parameters into a GPU for efficient single-query batch operation which is what you really want for mobile devices. So you hit those limits before you even worry about the GPU power. something like 70% of semiconductor production, by wafer area count, is memory, and it needs a revolution much bigger than what is needed for logic.

IanD · Feb 19, 2024

Xebec said:
Are the 'C' metrics referring to degrees Celsius? Curious why it would mix 125C Y axis with -40C X axis?

Because they're the worst cases for speed (X) and power (Y) -- X axis is slowest (slow transistors, low voltage and temperature), Y axis is highest power (fast transistors, high voltage and temperature).

IanD · Feb 19, 2024

Tanj said:
There is a lot written about the transistors, but the wiring is at least equally important. As the EDA has difficulty wiring it becomes harder to keep gates closer, the drive has to go through more vias to find higher layers, more buffers get added, etc. I do wonder how much performance is left on the table by difficulties in layout algorithms. And how much do we need density vs. low power, are they necessarily the same thing? For example if there are 2 fins in the same track height as a 3 fin cell, so that wiring density is flexible but gate capacitance (and drive) is 2/3rds, would the layout be better than if the track height is squashed and we lose wiring tracks as well as fins? If wiring is harder then those 2 fins don't fan out as well, maybe the bigger cell actually would have fewer Joules per unit of work and better battery life, while running just as fast for anything the user needs.

It is interesting that transistor scaling has serious limits in physics due to leakage, channel length minimum, and doping, while wiring is nowhere near its limits except for lithography. Sure, wiring has issues of increasing resistance but it does not simply fail to work if it gets, say, 2x as small whereas even with 2D materials we are not seeing channel length cut in half. And even when we do advance transistors some things do not improve - last I saw, ribbons do not have lower capacitance than fins. Yet we do not see a lot written about advanced processes which highlight how much the wiring might improve independently of the transistors. Seems like there is potential for advances there.

I think Intel's backside power is an example of advances coming from the wiring. Very promising, not just reducing crowding and stacking on the signal wiring but also likely to reduce droop and support lower voltage operation. It will be interesting to watch this come into full production and see if it really works as well as it pencils out. Aspects like thermal management, stress on the super-thin Si remnant layer, and long term reliability or tolerance of wide temperature ranges will be things to look at.

You need to be careful about making assumptions like "backside power is best" -- TSMCs recommendations for N2 is that the BPD process is targeted at applications like CPUs where flat-out speed at elevated voltages (high current density) matter more than cost (BPD process will be more expensive) and gate density and power efficiency. For many devices there is also the *huge* issue of IP availability/compatibility, since layouts are not compatible between the two processes. So my guess is that for N2 BPD will be a specialty process for a relatively small number of applications (AMD? Nvidia? ARM CPUs?), though this may well change in later nodes when it gets adopted more generally.

Of course it may well be a very good choice for Intel because their major application -- i86 CPUs -- is exactly the one which BPD is good for, even if it's less well suited for foundry CMOS in general which is or course TSMCs biggest driver.

nghanayem · Feb 19, 2024

IanD said:
You need to be careful about making assumptions like "backside power is best" -- TSMCs recommendations for N2 is that the BPD process is targeted at applications like CPUs where flat-out speed at elevated voltages (high current density) matter more than cost (BPD process will be more expensive) and gate density and power efficiency. For many devices there is also the *huge* issue of IP availability/compatibility, since layouts are not compatible between the two processes. So my guess is that for N2 BPD will be a specialty process for a relatively small number of applications (AMD? Nvidia? ARM CPUs?), though this may well change in later nodes when it gets adopted more generally.

Of course it may well be a very good choice for Intel because their major application -- i86 CPUs -- is exactly the one which BPD is good for, even if it's less well suited for foundry CMOS in general which is or course TSMCs biggest driver.

I'm kind of the opposite on BPD. I think normal N2 might be less popular than N2+BSPD in 2027+. You have less restive power losses, and can use either a lower supply voltage or higher Vt at same speed to lower dynamic or leakage power due to the higher freq offered by BSPDNs. Alone that should make a strong case for mobile as well as the previously mentioned HPC. I also doubt that the cost adder is more than the density improvement TSMC claims (1.1-1.5x), so that should be a cost per FET neutral or even reduction. The two above points should also make N2+BSPD a large enough uplift to finally get folks to bite the bullet over just sticking to N3P.

As for the IP, people already have to redo everything for GAA, so I see that as less of a problem. As a process guy I wonder why folks would want to drag their feet to take a half step rather than moving to the new standard beyond N2 as fast as they are able to. Lastly, once the EDA tools are fully set up, wouldn't it also be easier to get convergence (don't know if that is the right phrase) vs FSPD where you have more routing conflicts?

IanD · Feb 19, 2024

nghanayem said:
I'm kind of the opposite on BPD. I think normal N2 might be less popular than N2+BSPD in 2027+. You have less restive power losses, and can use either a lower supply voltage or higher Vt at same speed to lower dynamic or leakage power due to the higher freq offered by BSPDNs. Alone that should make a strong case for mobile as well as the previously mentioned HPC. I also doubt that the cost adder is more than the density improvement TSMC claims (1.1-1.5x), so that should be a cost per FET neutral or even reduction. The two above points should also make N2+BSPD a large enough uplift to finally get folks to bite the bullet over just sticking to N3P.

As for the IP, people already have to redo everything for GAA, so I see that as less of a problem. As a process guy I wonder why folks would want to take drag their feet to take a half step rather than moving to the new standard beyond N2 as fast as they are able to. Lastly, once the EDA tools are fully set up, wouldn't it also be easier to get convergence (don't know if that is the right phrase) vs FSPD where you have more routing conflicts?

I was simply relaying what TSMC told us when we asked which process we should be targeting after N3...

At the IP level BPD is more disruptive to layout -- especially things like high-speed SERDES -- than going to GAA, which just swaps in a different transistor at the bulk level. With BPD the entire interconnect heirarchy is different, especially connections from transistors to the thick metals which also connect to I/O and power as well as components like inductors and transmission lines, and this means a rethink of macro floorplans and supply/signal connections -- which are actually one of the the hardest things to do, not the low-level transistor connections which are pretty much fixed by the foundry.

So going from N3 to N2 is pretty much a shrink/porting exercise, going to BPD needs a lot of rethinking and relayout -- and the tool chain is different too. It's a lot of effort, and risky if anything goes wrong with BPD.

There's no doubt BPD will come, but I don't think it will be mainstream in N2 for these reasons -- it's not proved yet, and there are other issues like heat dissipation from power-dense circuits. Once a few users have proved it works OK with no major stumbling blocks then it'll be more widely adopted, but I think this will be at the next node after N2 -- a bit like happened with EUV, N7E was used as a pipecleaner for N5 but not many people used it, partly because the design rules were not compatible with DUV N7 so relayout was needed.

IanD · Feb 19, 2024

Also:

ISS 2024 – Logic 2034 – Technology, Economics, and Sustainability - Semiwiki

For the 2024 SEMI International Strategy Symposium I was challenged by members of the organizing committee to look at where logic will be in ten years from a technology, economics, and sustainability perspective. The following is a discussion of my presentation. To understand logic, I believe it...

semiwiki.com

"Logic transistor costs go up at 2nm the first TSMC HNS sheet node where the shrink is modest. We expect the shrink at 14A to be larger as a second-generation HNS node (this is similar to what TSMC did with their first FinFET node)."

Doesn't compare costs of N2 and N2 BPD, but TSMC said any density increase (about 10% in the tables shown) would probably be cancelled out by higher wafer costs.

If N2 cost per transistor is similar to or even a little higher than N3, that's one big incentive to shift to the next node removed for many customers -- especially given that design and mask costs will be significantly higher...

BPD is a really sexy new technology with lots of advantages (and a few disadvantages), but "sexy new" doesn't sell chips, lower cost and power does... ;-)

Tanj · Feb 19, 2024

IanD said:
I was simply relaying what TSMC told us when we asked which process we should be targeting after N3...

Well, they do have a track record of talking down their competition. Rather like how Steve Jobs swore that humans only needed 72dpi screens when Windows supported high res for years, right up until Apple shipped Retina. FWIW, Intel claims it reduces cost.

IanD said:
especially connections from transistors to the thick metals which also connect to I/O and power as well as components like inductors and transmission lines, and this means a rethink of macro floorplans and supply/signal connections -- which are actually one of the the hardest things to do, not the low-level transistor connections which are pretty much fixed by the foundry.

You have a pretty much unobstructed access to power with the Intel approach (which is not the one IMEC has published) with very low resistance vias. Any source/collector can connect without fuss, from the look of it. The supply connections look very regular, leaving freedom for the signal lines. Hard to see why that makes things more difficult for analog, or for digital.

IanD said:
So going from N3 to N2 is pretty much a shrink/porting exercise, going to BPD needs a lot of rethinking and relayout -- and the tool chain is different too. It's a lot of effort, and risky if anything goes wrong with BPD.

Emphasis on working with the tools vendors. Intel have likely learned that lesson.

IanD said:
it's not proved yet, and there are other issues like heat dissipation from power-dense circuits.

I agree, it has risks. I see those mostly around the integrity and reliability of the very thin remaining silicon layer. Heat removal may be too, though there will be a lot of copper in the backside and a thin distance to the heat removal solution.

Overall I think it is a smart bet for Intel. One of the few things that could conceivably put them out front again, after years. Shows their engineers - and their managers - still have gumption.

Updating our current logic density benchmarking methodologies

How do we measure the maximum library density for nodes that support mixed row.

Based on the density of the highest density library that is usable in a standalone fashion

The geo mean of the libraries that make up the highest density configuration

Well-known member

Active member

Well-known member

Well-known member

Moderator

Active member

​

Well-known member

Well-known member

Well-known member

Active member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member