Are referring to, for example, what Apple did with a silicon interposer to create the M1 Ultra CPU, with the UltraFusion Architecture?Is it still a Chiplet if you tie together 2 or more "near reticle limit" dice?![]()
Array ( [content] => [params] => Array ( [0] => /forum/threads/isscc-n2-and-18a-has-same-sram-density.22126/page-3 ) [addOns] => Array ( [DL6/MLTP] => 13 [Hampel/TimeZoneDebug] => 1000070 [SV/ChangePostDate] => 2010200 [SemiWiki/Newsletter] => 1000010 [SemiWiki/WPMenu] => 1000010 [SemiWiki/XPressExtend] => 1000010 [ThemeHouse/XLink] => 1000970 [ThemeHouse/XPress] => 1010570 [XF] => 2021770 [XFI] => 1050270 ) [wordpress] => /var/www/html )
Are referring to, for example, what Apple did with a silicon interposer to create the M1 Ultra CPU, with the UltraFusion Architecture?Is it still a Chiplet if you tie together 2 or more "near reticle limit" dice?![]()
Besides the test conditions making 18A look better than it would in an iso-scenario environment, there is one important differences that I believe is working as a tailwind for intel. That difference being 4 nanowires instead of 3 nanowires. For the diagrams Intel showed of their HDC and HCC SRAM bitcells you can see very thin nanowires being used. For SRAM devices are often single fin. When your nanowires are approaching the thickness of single finFET fins, the finFET device starts having greater drive per unit area due to the lack of vertical gaps between each of the nanowires. Due to the extra nanowire, a 4 nanosheet device at iso-width has more drive area and performance. For logic, the math is kind of fuzzy to me for when 4 sheets are better than 3 (or vice versa). This is because more nanosheets make extra parasitic capacitance (as the third wire now has another wire above it and the fourth wire has one below it). The bottom ribbion also gets lets current coming into it for a 4 nanosheet device because the power needs to travel further through the S/D epi to reach the bottom nanosheet when you have extra height to go up and down past. So 4 sheets = more drive at iso area, but you also pay for more cap and that first nanosheet becomes even weaker (and it will always be pretty weak). Which is better is probably highly dependent on what chip you are designing and what the rest of the process node looks like. For SRAM I suspect that this calculus is much easier. The SRAM likes the narrowest nanosheet widths to maximize density, but more importantly, this increase short channel control and lower Vmin. In theory, that extra nanosheet allows you to use a narrower nanowire while not having a drive regression versus your finFET SRAM bitcell. The narrower nanosheet also lowers the pain from that extra capacitance from the 4th nanosheet gives you.a 33% difference increase in frequency at iso voltage that is ridiculous difference
I think one of the most notable factors is SRAM Vmin. TSMC reported that their SRAM Vmin improvement was 300mV vs intel's 100mV vs prior nodes from each company at their chosen temperatures. SRAM Vmin is heavily influenced by device variation (as you need to set Vmin around what your bottom x-percentile transistors look like). Given the narrow nanowire widths for the HDC SRAM any variations from LER and the like have an outsized impact. Combine this with Intel's statement that with 18A-P they get the variation low enough that 18A will start to make sense for mobile clients as well as the HPC chips that Intel process nodes have always excelled at serving. It probably doesn't help that Intel 3 drastically lowered leakage versus intel 4 (making it harder for 18A to shine), but it seems clear from the public numbers that the uplift from the GAA part of N2 is better than the GAA uplift on 18A. Ignoring the obvious factor of 18A having an earlier HVM start date, I suspect that Intel products (and especially CCG) needing support for high voltages (and the thicker Gox that intel processes tend to have to deliver this higher native Vmax) might be playing a significant part of the GAA part of 18A not seemingly offering as large power reduction over intel 3 as you would expect from a finFET to GAA transition. At least that model of "your gate control wasn't great to begin with due to the thick Gox and as a result each face of your all around Gox doesn't offer as much of a boon as you would maybe expect" seems intuitive enough for me.Intel claiming 18A is ready.
![]()
Intel 18A | See Our Biggest Process Innovation
See how the Intel 18A semiconductor manufacturing process is the foundation of the systems foundry for the AI era.www.intel.com
More so that the story is far more complex than SRAM test chips, macro densities, and performance under specific conditions. As an example, Intel 7 has better perf/watt at high V and better HD and HP logic density, but worse UHP logic density, HD SRAM density, and low power performance per watt.perhaps we should not judge an entire node based on a ISSCC paper. I am waiting to see what actually appears in products. thats just me LOL
I think it's fair to include Samsung since like it or not they are the first to bring a commercially available GAA process to market and their 3"nm" which seems to have now been reasonably been rebranded to a 2"nm" to better match up to TSMC/Intel/IMEC's idea of 2"nm" being roughly 3"nm" density but with GAA. But in the case they aren't referring to Samsung with their North American qualifier, it is also fair to say that they are referencing N2. Their comment of "readiness" was in reference to being ready to tape out final customer chips rather than the customer test chips that have been run over the past 2-3 years. Even though 18A is seemingly ahead of N2 from a manufacturability, DD, and ramp schedule perspective by around 1 year, Intel's 18A external foundry ecosystem/readiness is probably around 1-2Q behind TSMC N2. I have recently noted as much in a recent post, but I see this gap between HVM readiness and foundry ecosystem readiness as a major area where intel needs to improve on with 14A and ideally reach parity with TSMC on 10A or 7A. So far so good with Intel getting much earlier foundry enabling work (They said it was like 3-4Q ahead of where 18A was at a similar point during its development, but I don't remember if that was at IFS-DC 2024 or an investor meeting during a similar time frame), but it remains to be seen if this brings intel to parity of if they need another gen or two to build those relationships and capabilities.Very nice! What does "being ready" mean exactly?
Intel claims 18A is "The earliest available sub-2nm advanced node manufactured in North America, offering a resilient supply alternative for customers." Reading b/w the lines, this seems to suggest that TSMC N2 is also "ready".
I'm mostly just shocked the peak macro densities are the same. Intel has trailed TSMC on both bitcell and or marco density on an iso-node basis for most intel process technologies over the past two or so decades (just off the top of my head it was about a 15% lag on the 7"nm" and 5"nm" class process nodes). The bitcell not really benefiting from powervia due to the nature of power coming in from the edges of the array rather than the rest of the PDN (not sure if I got the terminology right there), expected logic density being around N3/N3E, and 18A's min feature size (per Intel's prior papers) trails N3/N2 by alot makes the 18A and N2 macro densities being equal all the more surprising.Went and found the ISSCC paper; RAM array raw cell size is indeed 0.021um2 for both Intel 18A and TSMC N2.
There isn't any indication that Panther lake is smaller than prior intel CPUs (Icelake was in the 120s and if memory serves most lead tick CPUs were in the 110-150mm^2 range). If you want to compare to Broadwell-U or Cannonlake-U it is actually alot bigger (if the current guesstimates of about 140mm^2 for the lead die are correct). IMO the biggest factor is actually letting the development teams do process development and getting out of their way. Who would have thought that keeping up with TSMC is much easier when your R&D budget isn't a fraction of TSMC's? Another big factor are the cultural reforms too. As an easy example, the change to a more modular process development (see intel's 18A direct print EUV paper at SPIE or the powerVIA paper) or the move towards deep collaboration with wafer making equipment vendors.First 18A product, internal or external, is Panther Lake, ramping in 2H25. If anything it has been brought in - original projections were late 1H26 back when Pat first started. Then Clearwater Forest ramping in late 2H25 for 1Q26 launch. Turns out if you don't need to chase large dies without defects, stuff can get out the door sooner![]()
According to Slotten's estimates, it was supposed to be better than N3. Scotten is a smart cookie, but his projections aren't "white papers" they are his estimations from broad stroke public roadmaps and where he thinks existing processes stand. Across interviews from 2021-2023 Pat and Ann have on multiple times stated that Intel 4 would greatly narrow the gap to TSMC's best, Intel 3 was at parity, 20A would be parity plus (whatever that meant) and 18A would have unquestioned performance per watt and process technology leadership by 2025. In 2024 intel claimed that Intel 3 perf/watt was about equal to TSMC's best with the density gap being far smaller than the gap between intel 7 and TSMC's best process in 2021. Looking at teardowns I don't know how anyone could call intel 3 worse than N4 in a general technological complexity sense or specifically for HPC usecases. For a mobile-AP sure it should be worse. That isn't what you are talking about.Yeah, Intel 3 was meant to have better performance than TSMC N3 according to the white papers, but turned out to be worse than TSMC N4 when comparing actual products (Granite Rapids / Turin).
DUH. The plurality and probably also the majority of Intel's unit shipments are monolithic 7nm parts. Wafer requirements are just lower for 5 and 3nm process than intel 7. And there is also the little detail that Intel decided to outsource most of the die area for their then next gen products back in the BK and BS days. That is lack of demand, not some inability to produce at high volume/yield. By your logic, N3 must be struggling real bad because intel's N3 demand is lower than Intel 7 demand and N3 capacity is far below current N5 and N7 capacity. Anyone with a brain could tell you that logic doesn't work because N3 is new and Intel intentionally has leaned into intel 7 products to make up the lower cost segments of the market where their cost ineffective chip designs cannot really play in.Intel is currently shipping more Intel wafers at 7nm and above than Intel wafers < 7nm. This will be true throughout 2025.
By this logic, Intel before 2024 was practically 40 years behind TSMC... Do technological innovations from Sony's image sensors group (such as the first stacked CMOS image sensor) not count as a technological advancement because Sony's image sensor process technology is only used in Sony image sensors? Saying intel hasn't achieved 18A until external customers launch 18A chips to market feels like goal post moving to me. Doubly so because the foundry enabling ecosystem for N2 would have started back before Pat was even CEO of Intel.Personally I would not raise the banner until 18A has big foundry customers. Making your own silicon work is very different from customer silicon work.
There are two aspects. As IanD, Intel, and TSMC have mentioned before, the stdcell utilization impact depends on how dense your PDN is. If it is very sparse, the benefit is pretty anemic. If it is something like an Intel CPU where the power delivery network is super dense, up to 10% density/utilization boost has been demonstrated on real CPUs. What is less commonly talked about or understood is that BSPD can also reduce logic standard cell height if you keep minimum metal pitch the same. Given current HD logic cells tend to be 4 signal tracks with two 3x wide power rails between cells (effectively two 1.5x tracks on the left or right of a cell), the maximum possible uplift is a 1.75x uplift. As an example, you could in theory shrink the BEOL for a 2-fin N3 4T NAND logic cell from a height of 162nm to 93nm by deleting the two 1.5xMMP M0 power rails (of course you can't do that in practice because a 2 fin NMOS and 2 fin PMOS, plus the diffusion breaks won't fit into 93nm, but you get the point). You can also do the intel thing and use the freed up area from removing the power rails to relax minimum metal pitch to reduce RC delay and wafer cost. As another example, take that 162nm tall 2 fin N3 cell again. If you redesigned the cell completely around a BSPDN, you could relax minimum metal pitch from 23nm to 40.5nm (a bit wider than TSMC N7's MMP). For extra details, see the attached thread:I forgot about the nanowire count.... excellent point.
So between @nghanayem , @Scotten Jones @IanCutress @IanD
What is the performance and density impact of BSPD modelled to be? I think I heard density number like 7% discussed at IEDM.
It's not all 18A it's a mix of Intel 7/3/18AClearwater Forest (a 288 e-core Xeon cpu product) should be all 18A. The challenge for CWF is that advanced packaging (Foveros direct or Foveros direct 3d) needs some further work, thus postponing to early 2026.
Not really. Phoronix benchmarked both and found Turin to be 40% faster than Granite Rapids, while consuming less power. That's way too much to be explained by the architecture alone. Zen5 has max 15% IPC improvement over Zen 4 and Zen 4 is maybe 1% less IPC than Redwood cove. The fact that it is consuming less power for me is a sign that it has a node advantage.It is unfair to say Intel 3 is worse than TSMC N4 because of Granite Rapids / Turing comparison. Granite Rapids used last generation of architecture from redwood cove. Even the newer architecture Lion Cove on TSMC N3 lost to AMD's Zen 5 on TSMC N4. I think it is more of intel's design team issue rather intel's process node issue.
It's not a iso design Intel's P core are considerably worse than Zen5 Core how can you say that for GNR Vs Turin same way Arrow Lake turned out meh even considering the node jump between Intel 7 and N3B (2 node jumps).Yeah, Intel 3 was meant to have better performance than TSMC N3 according to the white papers, but turned out to be worse than TSMC N4 when comparing actual products (Granite Rapids / Turin).
That's a 2S system a 1S system is only 20% behind also a 64 Core system is slightly behind a 128 Core system consider the workload geometry.Not really. Phoronix benchmarked both and found Turin to be 40% faster than Granite Rapids, while consuming less power. That's way too much to be explained by the architecture alone. Zen5 has max 15% IPC improvement over Zen 4 and Zen 4 is maybe 1% less IPC than Redwood cove. The fact that it is consuming less power for me is a sign that it has a node advantage.
Also, Lion Cove did not lose to Zen 5. Benchmarks shows them trading blows on application performance.
They are not counting their Base die otherwise it would be >50% ARL SIlicon on Intel cause that tile is bigger than all the TSMC pcs combined in terms of area.My own coverage of Intel and Intel's financials. Literally in the call and every conversation I've had with MJ and other execs.
@Daniel Nenni iirc they said 70% of the silicon of Panther Lake would be Intel. They might be including EMIB/packaging in that number.
View attachment 2820
from https://morethanmoore.substack.com/p/intel-2024-q4-financials
We're expecting Panther Lake to be commercial by end of year, CWF early in Q1. Ramp in this case really means 'achieved retail-class silicon', something like a B0 or B1 after the ES/QS cycles. Then it'll be put to the fabs, spend 3+ months before it's ready, and go to OEMs/retail.
Not really. Phoronix benchmarked both and found Turin to be 40% faster than Granite Rapids, while consuming less power. That's way too much to be explained by the architecture alone. Zen5 has max 15% IPC improvement over Zen 4 and Zen 4 is maybe 1% less IPC than Redwood cove. The fact that it is consuming less power for me is a sign that it has a node advantage.
Also, Lion Cove did not lose to Zen 5. Benchmarks shows them trading blows on application performance.
That's definitely one example.Are referring to, for example, what Apple did with a silicon interposer to create the M1 Ultra CPU, with the UltraFusion Architecture?
No. Xeon 6980p has 12 channels and with MRDIMMs it actually has more bandwidth than TurinIt's worth noting the Xeon is 8 channels of memory while the Epyc is 12 channels.
But given those results you can hardly say Intel 3 is better than TSMC N4, let alone N3I don't think there is *quite* enough to declare an AMD node advantage here as the items above could account for the 20% difference in perf/watt.
Crap. I looked at the wrong chart. Thanks.No. Xeon 6980p has 12 channels and with MRDIMMs it actually has more bandwidth than Turin
But given those results you can hardly say Intel 3 is better than TSMC N4, let alone N3
What Apple did was really not a chiplet design at all. They took two big single-die SoCs and connected with what I've heard is a rather expensive custom interposer in a big package, and stuffed a bunch of DRAM chips in there too.That's definitely one example.
Just pointing out that while I agree with your general sentiment about those who can make large dies will be successful, chiplets can also push "the high end" further than single large die and not just be used for node capacity/small die yield cost issues).
(I still feel if more TSMC N3 capacity were generally available we'd see more larger dies on it now.. it's faster and cheaper per transistor than previous nodes - why aren't more products already released on it 3 years into production)
You can't say anything about Intel 3 vs N4 or N3 based on those results.No. Xeon 6980p has 12 channels and with MRDIMMs it actually has more bandwidth than Turin
But given those results you can hardly say Intel 3 is better than TSMC N4, let alone N3
Why would it? FEOL is very crowded on SRAM compared to logic (where the BEOL is oftentimes the more congested part). Powervia and BPR takes up space besides the transistors and if you were already at the narrowest nanowire sizes you can't just narrow the nanowire to reclaim the space of the nano TSVs. A backside contact should in theory eliminate this concern while also being a higher performance solution (less capacitance and shorter/lower resistance). So I would assume a bitcell with BS contact shouldn't have that regression. Either way I thought for memory arrays the power for a bitcell comes in via the bitline and wordline. Since the bit/wordlines run across the whole array grid/are a signal layer that would have been running on the front side it only matters that you have the BSPDN supplying power to the bitcell at the array perimeter yes? Beyond that you are placing powervias that aren't actually removing power rails (since SRAM bitcells outside the array edges don't have power rails) and should by extension be worthless. I assume the killer application for SRAM scaling is once you start seeing backside metal layers evolve to include backside signaling and then moving the wordlines to the backside and having backside contacts supplying power at the array edges instead of the more conservative NPR or powervia schemes. But I'm not an electrical engineer so maybe my understanding of SRAM bitcells is incorrect.
I'm not saying it couldn't have been expected, just that it was odd to put this counter-spin to PowerVia at this time. I guess it means PowerVia will only be implemented in non-SRAM sections of the chip.Why would it? FEOL is very crowded on SRAM compared to logic (where the BEOL is oftentimes the more congested part). Powervia and BPR takes up space besides the transistors and if you were already at the narrowest nanowire sizes you can't just narrow the nanowire to reclaim the space of the nano TSVs. A backside contact should in theory eliminate this concern while also being a higher performance solution (less capacitance and shorter/lower resistance). So I would assume a bitcell with BS contact shouldn't have that regression. Either way I thought for memory arrays the power for a bitcell comes in via the bitline and wordline. Since the bit/wordlines run across the whole array grid/are a signal layer that would have been running on the front side it only matters that you have the BSPDN supplying power to the bitcell at the array perimeter yes? Beyond that you are placing powervias that aren't actually removing power rails (since SRAM bitcells outside the array edges don't have power rails) and should by extension be worthless. I assume the killer application for SRAM scaling is once you start seeing backside metal layers evolve to include backside signaling and then moving the wordlines to the backside and having backside contacts supplying power at the array edges instead of the more conservative NPR or powervia schemes. But I'm not an electrical engineer so maybe my understanding of SRAM bitcells is incorrect.
Fair... Although with the technologies speaking for themselves/already demonstrated (behind closed doors and at conferences). It isn't like Intel needs to hide that information for an industry white paper. While armchair experts might quibble about is BSPDN worth it. The actual logic players all know that it is essential for continued scaling and don't need a sales pitch to continue with BS metallization R&D. I would also guess most of the attendees probably knew that without having to see it spelled out for them and then reverse engineering the logic behind the result like I had to. Combine that with the point being to teach and show results and it feels like it would be against the spirit of the conference to not talk about all aspects.I'm not saying it couldn't have been expected, just that it was odd to put this counter-spin to PowerVia at this time.
Intel said they were implementing it on the periphery CMOS and the bitcells along the array edges.I guess it means PowerVia will only be implemented in non-SRAM sections of the chip.
It doesn't benefit the bitcells, the SRAM macro still benefits. As for A16 it uses a backside epi S/D contact that is completely under the Xtor rather than besides it (thus not constricting nano sheet width). Since as I said the SRAM bitcells don't have power rails there should be no benefit. However since the BS contact is fully below the Xtor you shouldn't see the area penalty for bitcells that do have a BSPDN. I assume you need to start seeing backside signaling first before backside metal layers can reduce bitcell area.If powerVia doesn't benefits SRAM does it means that A16 will not have SRAM bitcell decrease as well?