That is a fair distinction. But I think my main point remains, Intel's historical approach to process design needs to change to meet the needs of an expanded clientele.
The large voltage range V-scaling usecase still needs to take a high priority, as Intel products will be the largest foundry customer for the foreseeable future. While unintuitive, I feel like splitting the company kind of makes this issue worse. An independent intel foundry needs to hold onto every major CCG die for dear life. If you have both companies under one CEO you can say, "Hey CCG you just got to accept that at the very least the initial process will be more heavily tuned for low V operation. Foundry needs to develop this to more easily secure additional external customers, and it better allows DCAI to scale xPU core counts. You must change your IPs and hardening to compensate for this new reality.". An independent foundry (be it Intel, TSMC, etc) would be unable to dictate terms with a customer. So I wonder if a fully independent Intel foundry will be put between a rock and a hard place on focusing optimization on lower power operation and offering the high V and high f support CCG wants. Ensure you keep your existing business or make a play to maybe expand to other customers (potentially at the expense of your bread and butter). While it feels bad, I feel like it would be hard to justify doing anything drastic enough to risk CCG's business.
I don't know if I explained this in a coherent manner because the idea only recently dawned on me. The more I think about it, this seems like it might have even happened to GlobalFoundries. They had a dual SOI and bulk process roadmap with the SOI SHP process leading, the bulk HP process trailing by a year, and the bulk LP process trailing even further behind. Having to maintain all of these extra development tracks couldn't have been cheap, even if we make the reasonable assumption that many of the innovations are re-useable. Besides the existing Chartered customers; GF took a very long time to get customers beyond ST-M dual sourcing for some extra capacity, AMD, and later IBM. All of these guys were big SOI fans with a fondness for high V and high f capability. Based on GF having a bulk HP and bulk LP path I can't imagine GF folks back in 2009 were thinking SOI was suitable for mobile or much of the industrial/auto/embedded markets. If you have the needs of AMD dictating what tradeoffs are made on your lead process offering, I guess it isn't hard to see why it would be difficult to swing QCOM from TSMC or MTK from UMC.
Would a bulk-LP first or only GF have had an easier time grabbing leading edge mobile contracts? Probably.
Would AMD have found a way to renegotiate their wafer agreements if GF wasn't providing them a suitable HP process? Maybe.
If AMD stuck with a bulk/LP only GF, would that have hurt AMD's product "competitiveness" and lower AMD's wafer demand? Definitely.
I'd be curious to see how things shake out for that alternative timeline GF that focused on a more broadly desirable process roadmap at the expense of making something "worse" for their current customers. Are things better, worse, or more or less the same?
I'm talking less about building capacity here and more about how you start wafers in the fab, and in that instance I think there is a subtle difference. Intel has frequently had to write down inventory over the years. I think it is safe to say that this is due to Intel's projections being overly optimistic. Intel's general philosophy seemed to be that they didn't want to leave a single potential sale on the table and could project demand over several quarters. I believe that this approach allowed for Intel's factories to start wafers earlier than they would have otherwise been able to and gave them more flexibility in running their fabs. This approach is what I am calling the build-to-projection model.
Now Intel is going to have customers come to them with orders for a very specific product on a specific delivery date as opposed to the more general orders placed by Intel products group that extend farther into the future. This is what I am calling the build-to-order model. Dealing with these specific product specific requrements will require Intel to run their fabs differently. Under the older, build-to-projection model, Intel could load their factories quite heavily, knowing how long the production tail was going to be. Factory physics says the price for heavy factory loadings is a reduction in velocity of individual lots through the fab in return for higher output. Running the fab slowly wasn't a problem as long as the output was there. On the other hand if you have very specific groups of lot that all need to move through the factory at a specific velocity, you don't have nearly as much flexibility in how you load your factory. That will be a paradigm shift for Intel.
Note that I'm not talking about specific commitments to run X number of wafers on the 18A process this year, but rather how you get specific lots through the factory on a specific timeline.
I see what you mean now. When Intel talked about cost savings identified by new accounting, they said some similar and some different things.
One similarity was yes in the old days Intel would have extra surge capacity so that money was never left on the table. Of course this practice very clearly stopped in the 14nm and 10nm eras where Intel was running at near 100% utilization for years and for bizarre reasons chose to outsource PCH dies and chipsets to Samsung rather than doing any number of forehead slapingly obvious things to the exisitng factory network (like move logic to China and give up on NAND when Micron JV ended, converting NM to more advanced process technologies, and filling empty fab shells literally anytime before 2020 instead of holding off so you don't have to start deprecating the buildings). Or even novel idea, build a new fab for the first time in a decade. Like oh I don't know one 450mm/EUV capabale D1X_mod 1 style mod at Israel and Ireland so that it isn't just OR and AZ with any fabs capable of EUV or 450mm (funny to think about today, but back then 450mm was still on the table during the realvent time period). Another similarity you mentioned was when you run a low mix of products, the velocity doesn't really matter as much as total output. As long as you have X wafers of raptor lake coming out of the fab per week, it doesn't really matter when that wafer was started (obviously some exceptions apply) since any two lots of raptorlake are completly interchangable with each other.
One difference was that the product division would rapidly oscillate their wafer start asks and cause alot of churn in the factories. There was also the comical volume of hot lots. I think they said it was 3x the volume and hurt total output by like 10%. So while the main production line was slowed to a crawl, Intel was seemingly good at hand holding a fairly large volume of lots quickly though the line to get all of those unplanned dash steppings and high demand products out.
I think the bigger thing that intel needs to make sure they can demonstrate world-class capability in the field of fab operations is tool health and wafer quality. If a Meteor lake lot gets scrapped not the end of the world, there are a whole lot of identical meteor lake lots that are right behind it. You scrap a one of a customer's 3 qualification lots, and you are in VERY DEEP doo-doo. Another major difference is at an IDM you want to save every wafer and every die. Any die that could have been saved but wasn't, isn't just the cost of the wafer but also lost revenue. Let's say you have a wafer abort inside an etch tool mid etch. At an IDM if it looks like it could be saved you would write a custom etch recipe to finish the etch. If it is saved; great, you saved the company alot of money. Maybe when it gets to assembly test it works just fine, maybe it performs a bit worse, or maybe it is totaly unsellable. The cost saving potetial often warrents giving it a try. At a foundry this doesn't fly. It comes down to whose risk this wafer is. For an IDM product and fab play for the same team, so it is both of their risk. They also have internal wafer sort so they will be able to make sure that wafer is good before shipping it out. For a fabless customer they send the wafer to an external OSAT with the foundry and OSAT having no collaboration. The fabless firm might even skip having the wafer sorted for cost saving reasons. If they take a suspect wafer, they are taking all of the risk on the wafer they paid the foundry being a dud after they pay an OSAT to take a look and maybe even risk sending defective material to their customers. For this reason foundry customers will never take suspect material. As a result of the wafer being the final product and not being able to sort test the wafers before handing them to the customer, foundries have to scrap anything that their own internal metrology cannot guarantee to be functionally identical to the customer's qualification samples. It is for this reason that IDMs have higher wafer yields (number of wafers that survive to the end of line) than foundries.
In this new world, Intel's wafer manufacturing arm now needs to make sure that they have sufficient in-line metrology to catch any issues without the aid of end of line testing even if Intel foundry and Intel products still are ok maintaining their business as usual "every die wants to live" operations. Inline metrology and first time right quality while always important are even more critical when Intel will be dealing with a higher mix lower volume environment where any tiny screw-up can wipe out most of a customer's in progress material, kill a PDK qualification lot, a testchip, or an NPI.
You make an excellent point. But if Intel is going to be a successful foundry they are going to have to cleary communicate their expectations on when a process is ready for external foundry producition as opposed to internal CPU products.
No-one I have ever talked to who has actually worked in a fab for a number of years has ever given the indication that Intel process health at ramp is as a rule of thumb is notably worse than TSMC at the beginning of process ramp. This is at least consistent with multiple instances where TSMC clearly came out of the blocks worse off. Examples off the top of my head include 22 vs 20 or 16nm, 45 vs 40nm, and 65 vs 65nm. But with that said, I suspect this is why Pat frequently mentioned Intel presumably always being the lead product (unless a customer really wanted to be the lead) and how Intel products would de-risk for external foundry customers. Either way, it isn't exactly like foundry customers can't just look at where the process is from a variation, different electrical parameters, performance as a % of the final performance target, and DD for said customer's testchip of choice. They can see exactly where in the process development lifecycle they are.
I think the more important area for improvement is how development is done. At TSMC, they focus on getting performance very close to the final target and accept that yield will improve more slowly for the time being. Once they get their performance architecture in place they can be confident that the changes needed from then on out are more evolutionary than revolutionary and fewer (if any) changes will occur that require significant (if any) design rework. From there they focus all effort on variation reduction (helps get those last few % of performance and improves yield) as well as an process changes to enhance die yield. It is also just efficent because instead of wasting time imrpoving yield just for your latest performance improvement to break it again you can focus on improving the yield on a mostly final process flow. Since fabless companies don't want to have to redo their homework or run new steppings due to some process change TSMC made. My understanding is that outside of the IP designers doing their thing, actual chip design tends to wait until the process more or less lock-in. If we take a look at 18A and N2, 18A will be ready first, and yet the anoucments for PDKs 0.9 and 1.0 for those two processes came out at a similar times with non Intel/Apple tapeouts happening in 2025. Granted, TSMC has been doing this for 40+ years and this is Intel's first rodeo. But it illustrates my point on something that needs to be improved over time. Another area I think needs improvement is being more restricitive early and loosening up later rather than the opposite. As an example with 10nm Intel said vias had minimum placement restrictions and beyond that a blank canvas in the sevice of designer hand tuning. For Intel 4 they talked about how they dropped that idea for a grid that vias must land in. Infinite layouts vs finite options make dealing with LLEs and process modeling easier. I also assume that fabless companies also just have no interest in any sort of "hand tuning" of layouts. For contrast look at TSMC N5. Even areas where your S/D contacts were not needed had to have dummy metal placed. Then on the more mature N5P and later you started to see less and less dummy metal. TSMC gives you a small sandbox that they can validate as for sure working. Then as they increase process margin and reduce variation they can expanded the sandbox. You can still stick to the old design rules and be safe, and if you change your design to the new looser design rules you can do so with the confidence everything will work per TSMC's process simulations.
Thank you for taking to time to clarify this for me. I second BlueOne's comment. Excellent post.
It is my pleasure. I love to learn and to share what I know.