Yeah, I can add some color to my statement. TSMC said SPR is optional on A16 and that N2 IPs are drop in compatible. This indicates that A16 uses standard cells with the same size as N2, and that the M0/M2 power rails are still there (hence preventing any compaction of the cell height). It is for this reason why I call it a partial implementation as opposed to 18A which is BSPDN only and so it gets all the benefits rather than just a good chunk of them. My observation that A16 doesn't have shrunken cells from N2 is further backed up by their statement of 7-10% density boost, which is exactly inline with intel's findings of a 10% utilization uplift for a Crestmont E core. Of course, as you indicated, the exact benefit will depend on your power grid. A hypothetical chip with practically no PDN would obviously get little to no benefit.
But when you look at things from the perspective of individual transistors is where the larger PPA upside is. As you said, SPR is technologically more advanced than PowerVIA gen 1. SPR won't intrude on the amount of space you have for wider nanosheets, the resistance should be lower, as should the parasitic capacitance. The major impediment native to A16's implementation is not being able to delete the power rails due to the need to support FSPDN and be backwards compatible with N2. Not making A16 BSPDN only provide a cell compaction for every transistor that is not dependent on the exact chip and how routing limited the design is. There is nothing wrong with TSMC doing it this way, and for their business it is almost assuredly the best way to go about things. But it does hurt their area scaling and cost per FET. But you don't need to take IMEC or my words on the matter, let's look at some real examples to illustrate my point:
N5 short NAND cell: 210nm with a 28nm M0 pitch and 7.5 M0 tracks
N3 short NAND cell: 162nm with a 23nm M0 pitch and 7 M0 tracks (2 1.5x M0 tracks for power)
i3 short NAND cell: 210nm with a 30nm M0 pitch and 7 M0 tracks (2 1.5x M0 tracks for power)
i4+20A BEOL cell: 210nm with a 36nm M0 pitch and 5.8 M0 tracks
On i4+powerVIA intel can hit the same cell heights as N5's HD cells even after walking back M0 pitch from the intel 4 30nm to an intel 7 like 36nm. When you compare to N5, N5 needed a 22.2% feature shrink (not far from the 30% needed for a 2x lithographic shrink) to have the same cell height as intel 4 + power via. Allowing for 2D direct print EUV instead of 1D EUV-assisted quad patterning. Intel claims the cost saving was great enough to make the BSP process cost neutral. Put another way, when you use BSP like this you can reset your BEOL lithographic requirements by a full lithographic node of scaling. But you don't need to use it that way either. If you wanted, you could also keep M0 pitch the same and get to smaller cell heights. With the same 5.8 tracks and 30p you could have 174nm height, and with 23p you could do 133nm. Given the slowing of pitch scaling and the cost associated with doing so, I can't overstate how large of a boon this is for the AC part of PPAC. IMEC, intel, and cadence both claim that a BSP only process will shrink cell heights by 20% and AMAT claims a 20-30% cell height reduction. Note: this is NOT the 10% or so utilization increase reported by TSMC and intel for A16/i4+powerVIA. That "up to 10% utilization improvement" is extra gravy on top of the flat 20% area reduction from removing the power rails from the cell (or you can of course can also be used to relax metal pitches and reset metal pitch scaling by a full lithographic node of scaling).
Supporting diagrams and illustrations below courtesy of Cadence, IMEC, and AMAT:
View attachment 2503View attachment 2504
View attachment 2506
View attachment 2505