The SRAM macro shrinking is not super surprising since TSMC is always good about finding clever design tricks to get more density, but the bitcell size shrink in spite of the suspected lack of CD shrink is more so. I have only seen the bitcell number in the Tom's article. Is that bitcell value what TSMC has quoted or is it just people reverse engineering that bitcell shrink based on the SRAM macro density improvement? If that bitcell value is correct, I hope TSMC elaborates how they got there. But I suspect in true TSMC fashion they won't, and I will just have to wait until fall 2026 to look at A20 Pro teardowns
.
I am kind of shocked that TSMC is finally doing Cu top metal (not because it is hard just because they haven't for decades). It makes me genuinely curious as to why NOW?! This is early 2000s technology. My understanding is wirebond packages need Al top metal, but not every die TSMC makes goes into a wirebond package. For all of the guys out there using BGA's you would think the small cost adder for Cu would be a no brainier with the extra performance it gives. But alas, be it Apple, NVIDIA, or AMD you would see Al top metal even on N3/N3E. So it again begs the question, if M4 couldn't justify Cu for the top metal, why is Cu now the default?
It may have taken 6 years, but huzzah, they have finally matched the 10nm Super Fin MIM cap! I was wondering if their SHP-MiM would be TSMC's first 3D MIM cap on an active die (as they showed off 3D MIM caps for passive COWS-interconnect bridges). But that would be a no. I do wonder if A14 (or whatever they end up calling it) finally integrates a 3D MIM cap?
That is a complex question, and it deserves a complex answer! So all else being equal HNS should reduce SRAM density rather than increasing it (assuming a bit cell drive degradation is unacceptable). uHD SRAM bitcells use 1 fin transistors. By nature, a HNS Xtor is basically a finFET fin but sliced into horizontal sheets. As a result it has less drive area in the same lateral space as a one fin finFET device. The real benefit of HNS is the slightly better electrostatic control, not being forced to increase transistor width in discrete intervals, and that you can get the drive area of multiple fins in a lesser area (on account of the space between the fins now being one continuous sheet of varrying width).
In the specific case of N2 SRAM bitcell being denser than N3 and N3E, it is strongly suspected that the better channel control is not being used to scale the poly pitch (cell width). There are two opportunities I can think of to enable this scaling. One N3 has a dielectric wall that is formed in every space that is larger than one fin pitch. For uHD SRAM (and to a lesser degree the logic cells) this blows out the cell height way larger than N5 despite the aggressive pitch scaling (see fig 1). N3E got rid of this wall and went back to cutting the metal gate to isolate the devices from each other (hence why N3E is not less dense than N3 despite having a relaxed poly pitch). It wouldn't shock me if with N2 they further cleaned this up and for a lack of better words reclaimed some space they weren't able to squeeze out on N3E. Theory two is that they are moving the spacing between diffusions (ie N-P) from 2 fin pitches to 1 fin pitch. Intel showed it could be done on intel 4, and I would be shocked if TSMC wasn't far behind in reclaiming this "dead" space. Bonus round: while it wouldn't account for the whole 20% bitcell density boost, but I suspect that N2 just has the optical shrink built in rather than being an optional N2P feature. Originally TSMC said N2 would be a >10% density boost vs N3E, and that N3P would be a 4% boost over N3E (likely due to a 2% optical shrink like they did with N5->N4). Then after a little while they said N2 would be a 15% boost vs N3E. Recently they also opened up that N2P would not have any density uplift vs N2. To me that reads as N2 was going to unwind the optical shrink of N3P and that N2P would reintroduce the optional optical shrink. However N2 was healthy enough that reintroducing the optical shrink wouldn't have negatively impacted TSMC's ability to hit Apple's schedule to launch N2 products in 2026. Reintroducing the optical shrink would also pull double duty with addressing customer concerns over the small PPA uplift/cost per FET increase over N3P (assuming customers "cross shopping" with N3P are willing to take whatever the D0 hit on N3P vs N2 with optical shrink is).
As for further scaling opportunities beyond N2, the rules stay the same. More lateral shrink or go 3D. Post N2 I would assume TSMC will start scaling poly pitch again. The question is if they will continue with metal gate cut and find a different way to do their contacts in a MGC friendly way, or if they will go back to self aligned contacts with self aligned gate endcaps/dielectric wall. Since GAA is by definition less of an improvement over finFET (from the perspective of electrostatic control) the headroom for additional Lg shrink is not likely to be particularly impressive. IMO the main avenue for further SRAM and logic shrink is by reducing cell height. While yes there is the obvious avenue of shrinking feature sizes (at the cost of worse performance, cap, and wafer cost). So finding avenues to squeeze your existing transistors closer togheter seem like the better bet. The most common way to go about that is reducing the number of metal tracks needed for a given function. TSMC would get a large one time benefit if on A14 they commit to only using BSPDN going forward as they can remove the spacing that is dedicated for FS power rails from their cell heights. From there the next obvious route for improvement is for moving some of the signal lines to the BS (if my understanding of SRAM bitcell design is correct moving the word line with your VSS/VDD would provide another big one time scaling benefit). BS signaling would also become an issue of greater importance once logic makes the jump to 3D.
While doing all of this cell height scaling alot of work would need to be done to shave off every last bit of parasitic capacitance (as devices would now be closer together and you will see power AND performance degradations if things remain otherwise unchanged) and alot of work would be needed for maximizing drive per unit area. On N2 TSMC will be able to have thick nanosheets, but to make use of all of that BEOL cell height reduction the device width will also need to shrink. In theory IMEC's forksheet concept allows you to further shrink the N-P allowing for a wider nanosheet in a given area. But that comes at the cost of some of your electrostatic control. I don't think I have seen TSMC write any papers on forksheets yet, so it would seem they are trying to rush to CFET and skip over forksheet-FET. If TSMC insists on continuing to have two TD teams as development times extend from 2-3 years to 5-7 years (per Dr. Y. J. Mii); that probably makes sense. Assuming intel does hit a 2 year cadence between "full" nodes post 18A, then TSMC will need to make sure that their average full node PPA uplift is at least 1.5-2x of intel's average full node PPA uplift as TSMC formalizes their cadence of 3-4 years between full nodes (assuming their aspiration is ensuring that they consistently stay ahead of intel).
fig 1.
View attachment 2434
Edit:
Another Idea just came to me for how TSMC could get such a large SRAM bitcell shrink despite not having a feature size reduction and HNS-FET on paper requiring Xtors that are wider than a single fin device to deliver the same drive as said single fin device. My off the wall idea is that TSMC's uHD SRAM could be going from 6 transistors to 4. In theory it can be done, but my very rudimentary understanding is that those extra 2 transistors help minimize bitcell leakage and increase important figures of merit for a memory like retention time. Maybe TSMC has gotten their SRAM Vmin and transistor leakage so low that they can get away with a 4T SRAM bitcell having acceptable characteristics? However I would file this idea as less likely than the above theories. 6T bitcells have been the norm for many decades at this point because it hits a nice sweet spot for about as dense as you can go without having a bitcell that isn't a very good bitcell. For kicks and giggles, after doing some back of the envelope calculations to reverse engineer the reported values under the assumptions of 4T SRAM and a wider N2 device create values that at least seem potentially plausible.
Some VERY ROUGH napkin math that shouldn't be taken with any degree of seriousness:
N2 bitcell area * 1.2 = N3E bitcell area
1.2*(4*A2) = 6*A3
A2/A3 = 1.25
In other words an N2 uHD SRAM device could be 25% wider than a 1 fin N3E Xtor while still being allowing for an SRAM bitcell that is 0.83x the size. Of course this is a very simplistic approximation and doesn't account for any potential BEOL scaling bottlenecks.