Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/threads/isscc-n2-and-18a-has-same-sram-density.22126/page-4
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021770
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

ISSCC N2 and 18A has same SRAM Density.

Yes, I saw that afterwards. Makes me wonder where else they would exclude PowerVia.
Nothing else comes to mind, but SRAM didn't come to mind to me until I saw it... Logic only sees benefits and unlike the SRAM bitcells inside the array has power rails that can be deleted to save space. So I have to assume all logic cells will have it. I guess passives/some analog devices might not have it. But capacitors, inductors, and resistors aren't transistors. Presumably the transistors and logic inside the "analog" parts of the chip all REALLY want powervia for the lower voltage droop. If we are talking long channel devices the density boost from removing power rails would definitely be less impressive because those long channel devices have very large cell widths/poly pitches.

Speaking of I wonder if PLLs or parts of the clock tree just completely live on the backside of the wafer or if that is something that comes later once you start seeing fully functional backsides?
 
I understand that metal on the backside automatically doubles as a really large MIM capacitor
 
Nothing else comes to mind, but SRAM didn't come to mind to me until I saw it... Logic only sees benefits and unlike the SRAM bitcells inside the array has power rails that can be deleted to save space. So I have to assume all logic cells will have it. I guess passives/some analog devices might not have it. But capacitors, inductors, and resistors aren't transistors. Presumably the transistors and logic inside the "analog" parts of the chip all REALLY want powervia for the lower voltage droop. If we are talking long channel devices the density boost from removing power rails would definitely be less impressive because those long channel devices have very large cell widths/poly pitches.

Speaking of I wonder if PLLs or parts of the clock tree just completely live on the backside of the wafer or if that is something that comes later once you start seeing fully functional backsides?
So Vss, Vdd connections for SRAM array are frontside, can't be backside?
 
So Vss, Vdd connections for SRAM array are frontside, can't be backside?
I don't think so? But my understanding is a bit shaky. My understanding was that for a 6T SRAM the Vdds and Vss just connect transistors within the bitcell, right? I thought that all electricity into and out of the bitcell flowed through the BL and WL. If that is the case, then only the bitcells at the array edge would need to be getting direct power delivery as the power flows along the bit/word lines to any particular bitcell inside the array. If my understanding is correct it isn't an issue of can't but rather that the nature of an array only having power delivered to the array edges where it then flows to all bitcells across the string. I know you are a DRAM guy, but my understanding was that if you abstract an SRAM bitcell to the bitcell level rather than the transistor level, that an SRAM bitcell and a DRAM bitcell they are wired up to the array and the periphery in a similar manner. With that information, if you feel my understanding of how a bitcell works is wrong, please feel free to correct me because I don't want to be spreading incorrect information.
 
Looking at some prior but recent generations' SRAM design examples, of course they were frontside but they had Vss and Vdd lines on different metal layers. This is unlike the standard logic arrangement of both Vss and Vdd on same layer. I wonder if backside routing could not support so many metal layers?
 
I forgot about the nanowire count.... excellent point.

So between @nghanayem , @Scotten Jones @IanCutress @IanD

What is the performance and density impact of BSPD modelled to be? I think I heard density number like 7% discussed at IEDM.
Intel answered on their home page.

"
  • Industry-first PowerVia backside-power delivery technology, improving density and cell utilization by 5 to 10 percent and reducing resistive power delivery droop, resulting in up to 4 percent ISO-power performance improvement and greatly reduced inherent resistance (IR) drop vs. front-side power designs.
"
 
I have a bunch of comments on this thread I am going to roll into one big comment.

With respect to the Fmax Shmoo plots, what isn’t obvious until you read the papers is TSMC’s array is HD cells (plus double pumped although I don’t think that matters for clock speed). Intel’s array is HP cells (what Intel calls HCC). 5.6GHz for Intel versus 4.2GHz for TSMC isn’t an apples-to-apples comparison. I don’t know how different the clock speeds typically are for HD versus HP SRAM on the same process, I would love to hear from anyone here who does.

Xebex asked about comparing SRAM speed versus node, in the TSMC paper they gave normalized values of 1.00 for 5nm, 1.10 for 3nm, 1.17 for 2nm for SRAM Fmax.

Nghanayem was talking about iso area math for Horizontal Nanosheets (HNS) and FinFETs. IBM did some of this in their early HNS paper, I have also done a similar analysis. What I get is that a single wide nanosheet with 3 sheets has 1.26x the Weff of 3 FinFETs in the same area. Adding a fourth sheet as Intel is believed to be doing adds Weff but also adds capacitance. As Nghanayem also noted the stack performance is limited by the bottom sheet due to a parasitic mesa device under the stack causing poor performance of the bottom sheet. You can reduce the MESA device impact with an implant or better yet partial or full dielectric isolation under the stack.

There is an issue where electron mobility is better in HNS than electron mobility in FinFETs, but hole mobility is worse.

Another trade-off is inner spacers, they reduce capacitance but also reduce drive current.

In the Samsung SF3E and SF3 processes they don’t use inner spacers, and they don’t have any dielectric isolation under the nanosheet stack. Samsung is currently using a 3-sheet stack, but they have announced they will go to 4-sheets at 1.4nm.

It will be interesting to see if Intel or TSMC adopt inner spacers, dielectric isolation or one of the techniques to increase hole mobility. Intel has published some interesting work with SiGe pFETs and Si nFETs on a strain relaxed buffer. There is also SiGe cladding of the pFET sheet but that has a bunch of issues.

Intel catching TSMC for SRAM cell size is impressive. Amazingly TSMC N5, N3E, and N2 all have the same SRAM cell size, no shrink! N3 was a little smaller but they had yield issues. The real solution to SRAM cell size scaling will be CFETs that could cut the cell size nearly in half.

In terms of Backside Power Delivery (BPD):

With HNS it is hard to get below a 6-track logic cell height due to the power rails. Intel’s PowerVia and other backside power delivery solutions enable 5-track cells and TSMC’s backside solution in 2027 (maybe 2026 now) offers direct connections for a possible 4-track logic cell height.

Another advantage of BPD is that bringing power through the front of the die means the power must go through the entire via stack to get to the devices. Imec has shown BPD through nano-vias can reduce static power drop by 95% and dynamic power drop by 75%. I estimated a 15 via chain in TSMC 3nm has 560 ohms of resistance (estimated from a plot in TSMC's N3 paper), a nano-via has ~50 ohms (per Imec).

You can also put MIM caps on the backside and as Intel notes in their paper you don’t use PowerVia in the SRAM cells (although you do in the periphery) and you can use that entire backside area under the cell array for a giant negative bit line capacitor without taking up any otherwise useful area.

Backside processing can also enable through-wafer ESD diodes, better latch-up immunity, and even LDMOS devices so you can bring a higher voltage to the backside and regulate it down there for really high power delivery.

HNS and BPD both open up a whole host of scaling options that will drive logic scaling for at least another decade (although more slowly).

The Intel and TSMC SRAM papers are really interesting and this is a great discussion.
 
I have a bunch of comments on this thread I am going to roll into one big comment.

With respect to the Fmax Shmoo plots, what isn’t obvious until you read the papers is TSMC’s array is HD cells (plus double pumped although I don’t think that matters for clock speed). Intel’s array is HP cells (what Intel calls HCC). 5.6GHz for Intel versus 4.2GHz for TSMC isn’t an apples-to-apples comparison. I don’t know how different the clock speeds typically are for HD versus HP SRAM on the same process, I would love to hear from anyone here who does.
TSMC said that 4.2 GHz speed was for the HC array not the HD array. The major difference between Intel and TSMC methodologies seems to be the different operating temperatures for the testing. Also from what I have seen SRAM performance and efficiency is not 1:1 with logic, so I am not taking this one metric to the bank as 18A WAY faster than N2 in all aspects. Based on how Intel's Vmin reduction was smaller even at a lower temperature and N3E having presumably lower Vmin than Intel 3, I suspect N2's logic will be faster HP logic to HP logic. It would be funny if we got another intel 4/3 situation where Intel has better HP logic density and better HD logic performance, but TSMC leads in HD density and HP performance being seemingly better.
1740679018866.png

1740679063845.png


Another trade-off is inner spacers, they reduce capacitance but also reduce drive current.
I thought not having an inner spacer was a performance inhibitor due to how it forces you to make the device with less isolation?
In the Samsung SF3E and SF3 processes they don’t use inner spacers, and they don’t have any dielectric isolation under the nanosheet stack. Samsung is currently using a 3-sheet stack, but they have announced they will go to 4-sheets at 1.4nm.
I was kind of shocked by the lack of inner spacer on SF3(E). I didn't even know it was possible to do a proper nanowire release without it TBH.
It will be interesting to see if Intel or TSMC adopt inner spacers, dielectric isolation or one of the techniques to increase hole mobility. Intel has published some interesting work with SiGe pFETs and Si nFETs on a strain relaxed buffer. There is also SiGe cladding of the pFET sheet but that has a bunch of issues.
Doesn't BSPD throw a wrench into that as you are removing the bulk Si? Something I also wondered but never looked around deep enough to find out is if removing the bulk-Si and subfin completely removes leakage through the bulk. I would assume it does, but I don't really see aton of academic attention because you would think that would be a bigger deal if it works like it does in my mind.
Intel catching TSMC for SRAM cell size is impressive.
Yeah... That was certainly not on my bingo card
Amazingly TSMC N5, N3E, and N2 all have the same SRAM cell size, no shrink! N3 was a little smaller but they had yield issues.
Is it that crazy? Those infernal Self-Aligned-Gate-Endcaps blew out their cell heights from all that extra spacer between the fins/polycuts. And N2 is seemingly following the 20nm 16FF playbook of no litho shrink/minimal density improvement and a new much higher PPA device.
The real solution to SRAM cell size scaling will be CFETs that could cut the cell size nearly in half.
Do you not think that BS signal routing will come before CFET, because that seems like a sizable opportunity for improvement?
In terms of Backside Power Delivery (BPD):

With HNS it is hard to get below a 6-track logic cell height due to the power rails. Intel’s PowerVia and other backside power delivery solutions enable 5-track cells and TSMC’s backside solution in 2027 (maybe 2026 now) offers direct connections for a possible 4-track logic cell height.
This is why I am bummed that A16 is only offering such a small density improvement and TSMC's comments on easy porting from N2. Their standard cell seems to still be 6 or 7 M0 tracks tall when they could have it be 4! Hopefully the mobile customers don't demand a FSPDN version of A14 because that would be LAME.
Another advantage of BPD is that bringing power through the front of the die means the power must go through the entire via stack to get to the devices. Imec has shown BPD through nano-vias can reduce static power drop by 95% and dynamic power drop by 75%. I estimated a 15 via chain in TSMC 3nm has 560 ohms of resistance (estimated from a plot in TSMC's N3 paper), a nano-via has ~50 ohms (per Imec).

You can also put MIM caps on the backside and as Intel notes in their paper you don’t use PowerVia in the SRAM cells (although you do in the periphery) and you can use that entire backside area under the cell array for a giant negative bit line capacitor without taking up any otherwise useful area.
You could already put MIM caps in the BEOL before BSPDN. Intel mentions a 3D MIM cap for 18A, but teardowns have already shown 3D MIM caps in intel 4. Certainly explains how they doubled capacitance over Intel 7 and its many capacitor plates (which itself somehow has similar cap to what TSMC claims for their new and improved N2 MIM cap). Unless what you meant is you could more easily put bigger and bigger MIM caps on now that you are on a shorter BEOL stack on the backside of the wafer?

Great post BTW!
 
Amazingly TSMC N5, N3E, and N2 all have the same SRAM cell size, no shrink!

For an HP process, making the cell smaller just by trimming thing a bit, and accepting more leakage was possible I believe. The lithography, and n-patterning techniques are not their fundamental limits, nor with N2, nor with Intel's process. The question how good is their cell while being smaller. For desktop CPUs, a bit more leaky SRAM is easily acceptable.

Latest nodes avail different SRAM types from smallest, and fastest, to more efficient "storage 6T SRAM".

What I was looking many years ago is whether it's possible to make a speciality node just for SRAM, and it came out it's easily possible to have both much smaller, and performant device if you don't need to care about the rest of logic.
 
TSMC said that 4.2 GHz speed was for the HC array not the HD array. The major difference between Intel and TSMC methodologies seems to be the different operating temperatures for the testing. Also from what I have seen SRAM performance and efficiency is not 1:1 with logic, so I am not taking this one metric to the bank as 18A WAY faster than N2 in all aspects. Based on how Intel's Vmin reduction was smaller even at a lower temperature and N3E having presumably lower Vmin than Intel 3, I suspect N2's logic will be faster HP logic to HP logic. It would be funny if we got another intel 4/3 situation where Intel has better HP logic density and better HD logic performance, but TSMC leads in HD density and HP performance being seemingly better.
View attachment 2841
View attachment 2842


I thought not having an inner spacer was a performance inhibitor due to how it forces you to make the device with less isolation?

I was kind of shocked by the lack of inner spacer on SF3(E). I didn't even know it was possible to do a proper nanowire release without it TBH.

Doesn't BSPD throw a wrench into that as you are removing the bulk Si? Something I also wondered but never looked around deep enough to find out is if removing the bulk-Si and subfin completely removes leakage through the bulk. I would assume it does, but I don't really see aton of academic attention because you would think that would be a bigger deal if it works like it does in my mind.

Yeah... That was certainly not on my bingo card

Is it that crazy? Those infernal Self-Aligned-Gate-Endcaps blew out their cell heights from all that extra spacer between the fins/polycuts. And N2 is seemingly following the 20nm 16FF playbook of no litho shrink/minimal density improvement and a new much higher PPA device.

Do you not think that BS signal routing will come before CFET, because that seems like a sizable opportunity for improvement?

This is why I am bummed that A16 is only offering such a small density improvement and TSMC's comments on easy porting from N2. Their standard cell seems to still be 6 or 7 M0 tracks tall when they could have it be 4! Hopefully the mobile customers don't demand a FSPDN version of A14 because that would be LAME.

You could already put MIM caps in the BEOL before BSPDN. Intel mentions a 3D MIM cap for 18A, but teardowns have already shown 3D MIM caps in intel 4. Certainly explains how they doubled capacitance over Intel 7 and its many capacitor plates (which itself somehow has similar cap to what TSMC claims for their new and improved N2 MIM cap). Unless what you meant is you could more easily put bigger and bigger MIM caps on now that you are on a shorter BEOL stack on the backside of the wafer?

Great post BTW!
“TSMC said that 4.2 GHz speed was for the HC array not the HD array.”

I looked at the paper specifically to determine whether is was HD or HP/HC. On slide 27 they show the test chip and specifically say it is HD, slide 28 and 29 show Vmin plots and again say HD, slide 30 is the Shmoo plot and doesn’t say but based on the previous 3 slides I thought it must be the HD. I missed that line in the summary slide saying it is HC. Thanks!

Inner spacers are created by a recess etch of the SiGe layers once the stack is etched and then an oxide refill. They reduce channel to contact capacitance but move the embedded source/drain away from the channel in some places lowering strain and therefore drive current. They are an optional module; they aren’t needed for release. I am pretty sure Imec made lots of HNS before adding inner spacers to reduce capacitance.

With respect to SRB, BPD would definitely make it challenging, I am not sure whether an SRB could be accommodated, its an interesting thing to look into.

“Do you not think that BS signal routing will come before CFET, because that seems like a sizable opportunity for improvement?”

I have never heard that one, its something to look into.

“Hopefully the mobile customers don't demand a FSPDN version of A14 because that would be LAME.”

It is much worse than that! the mobile customers don’t want BPD, they also don’t want the Molybdenum (Mo) vias and eventually critical Mo interconnects that are coming. My understanding is the foundries may have to create: no BPD with Cu interconnect, and BPD with Mo interconnect, process versions for several nodes. BPD adds cost and Mo is more expensive than Cu, the mobile guys don’t want to pay for performance they don’t need.
 
For an HP process, making the cell smaller just by trimming thing a bit, and accepting more leakage was possible I believe. The lithography, and n-patterning techniques are not their fundamental limits, nor with N2, nor with Intel's process. The question how good is their cell while being smaller. For desktop CPUs, a bit more leaky SRAM is easily acceptable.

Latest nodes avail different SRAM types from smallest, and fastest, to more efficient "storage 6T SRAM".

What I was looking many years ago is whether it's possible to make a speciality node just for SRAM, and it came out it's easily possible to have both much smaller, and performant device if you don't need to care about the rest of logic.
It can also be dramatically cheaper because you don't need 17+ interconnect layers. I haven't seen any of the big logic companies talking about dedicated SRAM processes, maybe as ChipLets catch on it will generate interest.
 
Crap. I looked at the wrong chart. Thanks.

There is still something holding back scaling looking at 2P vs 1P for GNR vs Turin.

That alone makes it impossible to judge the difference in node performance. Hopefully you guys read the rest of the post after that mistake.
GNR was released Sep 2024. Considering the catastrophic performance numbers Intel put up in 2P, I would have thought they would have released a fix by now if it was something easily fixed.

Even for 1P, GNR is trailing by ~ 20% which is still quite a difficult position for Intel to find itself in for the DC market where they are bleeding market share and profit badly.

I was personally thinking that CWF on 18A might be a serious competitor for Intel, but much is dependent on Intel figuring out the issues in tile to tile communications that appear to cripple ARL. In DC it seems like these problems would cause even more issues than it does in the desktop/laptop.
 
This is why I am bummed that A16 is only offering such a small density improvement and TSMC's comments on easy porting from N2. Their standard cell seems to still be 6 or 7 M0 tracks tall when they could have it be 4! Hopefully the mobile customers don't demand a FSPDN version of A14 because that would be LAME.
I kinda get where they are coming from though. It is my understanding that the libraries between A16 and N2 will be compatible. Customers can then easily make test chips on N2 and also on A16 and determine if their designs are better served on one or the other.

I believe that BSPDN isn't a free lunch. In other words, you don't get something for nothing. It likely has disadvantages compared to N2 without BSPDN in some situations .... and certainly in price I would think.

Does that make sense, or do you think that there are no reasonable use cases where FSPDN with GAA would be better than BSPDN?
 
I kinda get where they are coming from though. It is my understanding that the libraries between A16 and N2 will be compatible. Customers can then easily make test chips on N2 and also on A16 and determine if their designs are better served on one or the other.

I believe that BSPDN isn't a free lunch. In other words, you don't get something for nothing. It likely has disadvantages compared to N2 without BSPDN in some situations .... and certainly in price I would think.

Does that make sense, or do you think that there are no reasonable use cases where FSPDN with GAA would be better than BSPDN?
See my post above, BPD is more expensive and the mobile guys don't need it and don't want to pay for it. I am hearing the foundries will have to offer with and without BPD plus different metallization schemes.
 
See my post above, BPD is more expensive and the mobile guys don't need it and don't want to pay for it. I am hearing the foundries will have to offer with and without BPD plus different metallization schemes.
So then the question I would have is does it make sense to utilize the same libraries for both, or would there be an advantage to making a FSPD library and a BSPD library so you could optimize for each process better?

It just seems that the more generic you try to make a circuit, the more inefficient it becomes vs. close to metal thinking (so to speak) where the transistor design and layout is customized for the process as much as possible.

Of course, as with all things in engineering, you don't get something for nothing. Such process specific optimization results in very poor portability to another process, or even sensitivities in the design to even mild changes in the process.
 
So then the question I would have is does it make sense to utilize the same libraries for both, or would there be an advantage to making a FSPD library and a BSPD library so you could optimize for each process better?

It just seems that the more generic you try to make a circuit, the more inefficient it becomes vs. close to metal thinking (so to speak) where the transistor design and layout is customized for the process as much as possible.

Of course, as with all things in engineering, you don't get something for nothing. Such process specific optimization results in very poor portability to another process, or even sensitivities in the design to even mild changes in the process.
I am not a design guy but my guess is the libraries will have to be different.
 
Back
Top