Some questions about manufacturing in semiconductor fabs:
1. Do fabs try to balance production flows, e.g. there is no single capacity bottleneck, and for each tool, its average throughput matches the throughput of both upstream and downstream tools? Or is there a reason to intentionally plan capacity in an unbalanced manner? If so, what kind of tools get purchased with intentionally higher or lower capacity than a balanced flow would require?
I can't imagine intentionally making lithography tool capacity higher than other tools, since they seem to get the spotlight as the most expensive equipment in the fab, with ASML's EUV machines showing up in newspaper articles as costing hundreds of millions of dollars each. (Aside from EUV, are they the most expensive? I found this quote from
Levinson 2011: "Lithography tools are often the most expensive in the wafer fab. Even when they are not, the fact that lithography is required for patterning many layers in IC manufacturing processes, while most other tools are used for only a few steps, means that a large number of lithography tools are needed for each wafer fab, resulting in high total costs for lithography equipment.")
2. How are tool upgrades typically prioritized? Is there a list of planned upgrades, and someone identifies the most bang for the buck? Does this tend to prioritize small low-risk upgrades over larger high-risk upgrades that might have better payoff? I can imagine a case of
greedy algorithm failure (the best short-term solution is a suboptimal long-term solution) where a bunch of small upgrades are chosen that add a total of 10-15% more capacity, but it might be more profitable in the long run to wait and spend more money on constructing a new building.
This is a surprisingly complex subject.
I remember back in the late nineties at the Advanced Semiconductor Manufacturing Conference there was a lot of talk about the Theory of Constraints and everyone was reading "The Goal" by Eliyahu M. Goldratt. There was at least one company who designed a fab with a built in constraint with the idea that if they optimized the constraint they would optimize the fab. It has always surprised me how uninformed fab operations were by manufacturing science even in the nineties. We had these incredibly expensive manufacturing plants and we ran them really badly from a manufacturing efficiency perspective.
In 1997 Don Martin of IBM published a paper "How the Law of Unanticipated Consequences Can Nullify the Theory of Constraints:
The Case for Balanced Capacity in a Semiconductor Manufacturing Line". The problem with having a fixed constraint is that while the constraint is optimized every other tool in the fab is underutilized. Even EUV tools aren't more expensive than everything else in the fab combined. The best way to design a fab is to have balanced capacity but even that isn't straight forward.
Variability is the enemy of efficient manufacturing and semiconductor fabs are a worst case scenario when it comes to manufacturing principles.
If you look at an automotive plant, you put something in the front end of your manufacturing lines, it progresses through a series of steps and comes out the back end complete. The equipment on the line has uptime >99% and everything moves in a straight line step to step.
In contrast to that, in a fab a lot of wafers starts into the fabs, gets cleaned, films grown or deposited, and then goes into photo to be patterned, then into etch, back to deposition and back to the same photo tools used the first time. This is called a reentrant flow and causes collisions between new lots and existing lots. Then you add in the awful tool reliability of 85 to 95% and the fab bottleneck moves around day to day and even hour to hour. There are a lot of other subtleties, too, scheduled down time is better than unscheduled down time and tools that take a long time to repair relative to the time to process a wafer, exponentially increase cycle time. Mixing batch and single wafer tools also create issues with how long you bank up lots to run a full batch. There is also a fundamental trade off between utilization and cycle time (Littles law), as utilization approaches 100%, cycle time goes to infinity!
I haven't actively worked on fab design and planning in many years but when I was last involved the only way to accurately model a fab was with discrete event simulation. It could take hours or even days to run a single fab simulation. One of the fabs I designed I laid out all the tools in AutoCAD and ran a plug-in that overlaid all of the process flows color coded with the width of the lines representing how many times the wafer traveled that path. You could calculate the total travel distance for a particular mix of flows and optimize your layout.
Speaking of the nineties, it was around 1995 when Sematech introduced Overall Equipment Effectiveness (OEE) and did a study of fabs all over the industry. In a nut shell, you figure out how many good - shippable wafers per hour a tool is producing and divide it by the tools theoretical throughput if it is always up and producing good wafers. Sematech found that industry wide, tools were only making about 30% of their theoretical capacity. This led to a big focus on OEE and when 300mm tools were designed they were all built to support multiple input FOUPs of wafers to insure the tool never ran out of wafers. The ironic part is when 300mm fabs started coming on-line the was a lot of surprise that cycle times went up, something that should have been expected since there was now extra wafers waiting at every tool. Even today OEE is around 50% in logic fabs and 60-70% range in memory fabs.
At the end of the day fabs should be optimized for good die out or cycle time depending on the business needs. Good die out optimization actually drives different utilization rates depending on process maturity and yield. At low yield you want short cycle time to speed up yield learning and as yield reaches mature levels you can run higher utilization and longer cycle times.
Memory fabs typically run a single node and have only a few flow variants but also get upgraded to new nodes every few years. Tool upgrades will be driven by the needs of the new node with tools that can't meet the process requirement replaced. I know of older fabs that have been through over ten node changes!
On the logic side, a company like Intel will build a fab and upgrade it to a new node on a 3 to 4 year interval.
Foundries are the most complex, they may do upgrades to new nodes but they also typically have multiple flows running in a fab and sometimes multiple nodes.
TSMC is probably the most node focused of the foundries, they typically build a fab for a node and leave it there for the life of the fab, but even then there are multiple flows in the fab. At any given node there are options around number of metals layers, and various modules like MIM caps, resistors, number of threshold voltages, etc. They also will have for example a 7nm fab that runs 7nm and then the 6nm second or third generation process.
Most tool upgrades are either more tools are needed for capacity or better tools for a new node, just putting in a new tool on an existing node is pretty rare.