The Memory Foundry Opportunity

nghanayem · May 14, 2024

Memory makers make their own logic that is integrated onto their memory dies. YMTC and soon after Kioxia have somewhat broken this paradigm by manufacturing the periphery logic on a separate wafer and using hybrid bonding to connect the control logic to the memory array. My understanding is that this makes it easier to form the memory array without worrying about staying in a safe thermal envelope for the CMOS. At IEDM TEL shared that over the next decade memory makers will move from high-K planar transistors to finFETs and GAAFETs. I think I also saw that there was interest on moving DRAM a hybrid bonded periphery both for manufactuability, but also to enable a CMOS under array architecture and greatly reduce die sizes. This started spinning some gears in my head over the past few months. As step count grows the number of fabs need to increase just to keep up with the fact that capacity decreases as they convert fabs to these smaller nodes. This issue would presumably get even worse as you move to 3D-DRAM when the processing durations for etch/dep tools get comically long. Figuring out how to do more advanced CMOS is also a tricky enough proposition for the logic houses. Doing that while also scaling bitcell density is a big ask. For the above reasons the thought occurred to me "hey why not outsource the CMOS logic?". If it is no longer on the same DRAM/flash die, I don't see why TSMC (or in Samsung's case Samsung Foundry) couldn't make a custom version of 12FFC or 14LPC with only 2 or 3 metal layers (even if that is old by logic standards it would smoke the current CMOS on DRAM dies which is roughly comparable to 45nm). If I was a logic house I would be willing to give those wafers away at like 10-20% margins, because the volume of an SK or Micron would make Apple or even an Intel look tiny by comparison. The main downfall of this idea is of course the cost (as making your own will always be cheaper than buying from someone else), but I thought if nothing else it was an interesting string to tug at. But who knows maybe the R&D cost savings, faster process development, not having CMOS yield on your yield pareto, and not needing as much fab space could make this "off the shelf" control logic idea more palatable. And for Samsung this seems like a no brainer once they disagrigate their periphery logic, because why waste time and money reinventing the wheel when SF already did all the work for you.

This line of thinking also made another idea worm its way into my head. Currently the memory industry is dominated by IDMs, and when a logic company wants on package memory it will purchase said memory and have it be integrated by their OSAT or increasingly often by a foundry. I wonder if it would make sense to just sell memory array dies and have customers hybrid bond the memory directly to their own logic. The die area impact of the array logic that would need to be integrated onto the customer's die should be minimal - especially if we are talking about some A14 xPU or ASIC. It also might open up a new foundry business opportunity for memory IDMs where instead of sending their array for TSMC to use SOIC to bolt it onto a customer's CMOS, they can leverage their vast packaging experience to do it themselves and capture more of the total value of a final system than they normally can. The feasibility of this idea is a bit more of a question mark though since I don't know enough about memory chip design to say if it would be possible to die shrink and integrate the control logic to be scattered throughout an xPU.

MKWVentures · May 15, 2024

Anything is possible. but some challenges: NAND logic is not normal logic. it is very specific, low speed, high voltage. and it needs to be optimized with array with each spin. HBM DRAM already is planning to do what you said (buy from foundry). Although there is some logic on each die in the HBM stack.

the challenge with any chiplet behavior .... is whether you can really disaggregate it.

Personally I think we will see Chiplet memory like you stated (we looked at it 15 years ago) and then commodity DRAM for mass memory and CXL.

I think the challenges are that the return on such a monumental design and development change has to be high. ... but people are looking at it.

Xebec · May 15, 2024

The memristor might be the pivot point for close coupling compute and memory.

Fred Chen · May 23, 2024

The separate wafer seems to present a cost barrier, although we may expect that to go down over time.

nghanayem · May 24, 2024

Fred Chen said:
The separate wafer seems to present a cost barrier, although we may expect that to go down over time.

That’s interesting, I would have guessed the issue preventing DRAM adoption was performance rather than cost. I would have figured that a DDR5/HBM3 die(s) would have a higher ASP than a nand die. If the ASP is higher presumably the separate wafer is easier to justify. I know you don’t do NAND Fred, but do you have any intuition as to why the math checks out for Nand but doesn’t check out for DRAM yet? Is part of it that CMOS under array provides less value for a 2D memory architecture?

Fred Chen · May 24, 2024

nghanayem said:
That’s interesting, I would have guessed the issue preventing DRAM adoption was performance rather than cost. I would have figured that a DDR5/HBM3 die(s) would have a higher ASP than a nand die. If the ASP is higher presumably the separate wafer is easier to justify. I know you don’t do NAND Fred, but do you have any intuition as to why the math checks out for Nand but doesn’t check out for DRAM yet? Is part of it that CMOS under array provides less value for a 2D memory architecture?

My guess is the 3D NAND wafer is much more expensive than the wafer bonding cost compared to the individual 2D DRAM or NAND wafer. Once it goes to 3D, DRAM may follow the same path as NAND.

Tanj · May 24, 2024

Memory foundry .. singing my song. Been advocating for this for years.

I don't think it literally requires random logic integrated on DRAM, or NAND. As the messages above indicate that is improbable, nice research project not really viable for a foundry approach unless someone invents a quite different memory. Even beyond the process difference for the leakage and the cells, DRAM is different economics. $3k wafers at advanced nodes.

No, what would be more interesting is a flexible process for the interfaces. Change the budget for ECC bits. Optimize the DQ Phy and link to work better in ways a customer can dream up. These should be practical and not have to go through the terrible least-common-denominator grinder of JEDEC, which not only denies useful features but adds years to the injury.

We can see a glimmer of this in the talk about replacing the base chip in an HBM stack, perhaps putting an Eliyan chip in there and having a premium HBM4 variant with lower energy per bit using a 5nm equivalent base. But, why is this only an option for HBM? Arguably we only have HBM in such high volume because other more commodity DRAM variants are a screaming pain, so backward looking that we pay 5x premiums for HBM the only option the vendors see fit to allow.

If only one of the oligarchy would break free and experiment with commodity solutions to the strangled bandwidth at the chip boundary we could have some real cost-effective progress. It does not need to be a dichotomy between a yield-challenged overpriced 2.5D stack, vs. the latest tweak to a 1990's inspired pinout on a DIMM.

Fred Chen · May 25, 2024

Fred Chen said:
My guess is the 3D NAND wafer is much more expensive than the wafer bonding cost compared to the individual 2D DRAM or NAND wafer. Once it goes to 3D, DRAM may follow the same path as NAND.

Eventually the 3D NAND die or wafer cost will go down, but so will the bonding cost, so this is just a preliminary concern. Thermal budget impact on performance/processing might also sway in favor of bonding, but perhaps current CuA already can tolerate (for some purposes).

Tanj · May 25, 2024

Fred Chen said:
Eventually the 3D NAND die or wafer cost will go down, but so will the bonding cost, so this is just a preliminary concern. Thermal budget impact on performance/processing might also sway in favor of bonding, but perhaps current CuA already can tolerate (for some purposes).

It is pointless to integrate logic on a NAND chip. NAND burns hundreds of pJ to deliver 1 bit, and has very low throughput and long latency. You can deliver a bit across 100M of glass fiber for 500x less energy per bit and 50x less latency.

When you work with silicon you need to build the intuition to bypass the illusions like "that tiny transistor cannot drive that enormous wire" (intuition: a beetle can carry a balloon). No logic will ever be "near" a Flash cell. Logic can be near SRAM. Or as Jim Gray famously explained it:

https://cs10.org/sp15/discussion/02/memH.pdf

Fred Chen · May 25, 2024

Tanj said:
It is pointless to integrate logic on a NAND chip. NAND burns hundreds of pJ to deliver 1 bit, and has very low throughput and long latency. You can deliver a bit across 100M of glass fiber for 500x less energy per bit and 50x less latency.

When you work with silicon you need to build the intuition to bypass the illusions like "that tiny transistor cannot drive that enormous wire" (intuition: a beetle can carry a balloon). No logic will ever be "near" a Flash cell. Logic can be near SRAM. Or as Jim Gray famously explained it:

https://cs10.org/sp15/discussion/02/memH.pdf

Right, not regular random logic. It would only be peripheral control "logic" that would bonded I'd imagine. The CMOS of CUA or CBA.

Search

The Memory Foundry Opportunity

nghanayem

Banned

MKWVentures

Moderator

Xebec

Well-known member

Fred Chen

Moderator

nghanayem

Banned

Fred Chen

Moderator

Tanj

Well-known member

Fred Chen

Moderator

Tanj

Well-known member

Fred Chen

Moderator