WP_Term Object
(
    [term_id] => 20
    [name] => Samsung Foundry
    [slug] => samsung-foundry
    [term_group] => 0
    [term_taxonomy_id] => 20
    [taxonomy] => category
    [description] => 
    [parent] => 158
    [count] => 128
    [filter] => raw
    [cat_ID] => 20
    [category_count] => 128
    [category_description] => 
    [cat_name] => Samsung Foundry
    [category_nicename] => samsung-foundry
    [category_parent] => 158
)
            
WP_Term Object
(
    [term_id] => 20
    [name] => Samsung Foundry
    [slug] => samsung-foundry
    [term_group] => 0
    [term_taxonomy_id] => 20
    [taxonomy] => category
    [description] => 
    [parent] => 158
    [count] => 128
    [filter] => raw
    [cat_ID] => 20
    [category_count] => 128
    [category_description] => 
    [cat_name] => Samsung Foundry
    [category_nicename] => samsung-foundry
    [category_parent] => 158
)

FDSOI Cost Analysis – Part I

FDSOI Cost Analysis – Part I
by khaki on 10-20-2015 at 7:00 am

 One of the most frequently discussed concerns regarding FDSOI adoption is the higher starting wafer cost compared to bulk technology. This discussion was also brought up after my earlier post, illustrating an FDSOI roadmap that extends to more mature nodes. The original FinFET technology announcement claimed 10% higher cost as the reason why they chose bulk FinFET over planar FDSOI.

While this easy-to-remember number resonates well with people familiar with typical foundry wafer cost in 28nm and above, it is certainly not true for Intel’s 22nm technology. The price of 300mm SOI wafers is known; it is roughly $300 higher than bulk Si wafers. For this to impose a 10% higher wafer cost on Intel, their 22nm manufacturing cost should be only $3000 per wafer. While their wafer cost is not known to the public, a ballpark can be estimated fairly easily.

Consider an Ivy Bridge four-core i7 chip (160mm2 area) with a price tag of roughly $300 per chip. Out of a 300mm wafer, you get more than 350 dies. In other words, each wafer is priced at roughly $100k after test and packaging. With typical Intel’s gross margin of 60%, the cost of manufacturing the wafer would be about $40k. Of course, unlike fabless-foundry model, Intel is an IDM. To keep things easier, let’s split the cost equally between design and technology; i.e., $20k manufacturing cost per wafer. So, unless somebody at SOITEC tried to milk them for $2000 per wafer, the additional cost of the starting SOI wafer would be only 1.5%. I invite interested readers to do a similar calculation across multiple nodes; That would be a very insightful observation about Moore’s law, especially at 14nm.

Why Cost Analysis?

While I have seen people trying to come up with very detailed cost models, I see little use to those complicated models. A cost model is only useful as long as the technology does not exists. Once a particular technology hits the manufacturing ramp, the exact cost of running wafers are very well known to the manufacturer. The negotiation between the customer and the foundry then sets the price tag. In such negotiations, the customer plays all cards they have (and so does the foundry), including their brand, volume, long-term relationship with the foundry, their contribution to the development of the technology, the availability of other options, etc. Arguments like “I have done my homework, your price should be $2643 and not $2850” are far less stronger than “For this particular product I don’t need xVT device, let’s drop it” or “Can I get $150 discount if we make our block X open to all your customers?”

At the executive level you do not even need a detailed model. If a particular technology has 40 masks and priced at $3000 per wafer, dropping 2 masks roughly equals $150 cost saving; no need to spend time arguing if the mask is KrF or ArF. In fact, I would prefer such a rough, yet clear, calculation over sophisticated models that are viewed as a black box. Remember, the cost per mask is not the cost of printing the mask, it is the aggregate of lithography, metrology, processing steps (sometimes multiple ion implantation steps), cleaning, inspection, etc., for a given mask. So, unless you know exactly how the wafers are processed in a given technology (which by the way is not known until the process is frozen), it is almost impossible to build an accurate model.

It’s All About Assumptions
The first FDSOI cost analysis I saw back in 2008, actually showed FDSOI at disadvantage even compared to PDSOI (which needs to pay the same premium for blank wafer). It took me a few hours to go through detailed Excel sheets to figure out the assumption behind the calculation. The authors (two senior managers that were certainly familiar with both technologies) assumed that the only advantage of the FDSOI is its lower threshold voltage variation; so, it is a good device for SRAM. For logic, however, “there is no eSiGe” so FDSOI performance is not competitive.

The net? They had actually come up with a hybrid technology where SRAM was on FDSOI and everything else was on PDSOI. The resulting higher cost was not because of higher substrate cost; it was because they were actually forming two different set of transistors on the wafer. One might argue if this is not the case with today’s FDSOI technology that relies on FDSOI-bulk hybrid implementation for some of the ESD devices.

The answer is no: First, there is a difference between making an extra set of transistors versus making diodes, bipolar, or resistors in the substrate that use the same doping as used to form the wells. Second, the step height difference between FDSOI and bulk is only 30nm compared to 70nm topography between FDSOI and PDSOI. So, it is much easier to either live with that step height (it is used in passive devices that can be placed at a distance from transistors) or level it off with a shorter epitaxy process.

I brought up the above example to emphasize the danger of detailed engineering work once it is presented outside the cubical; whether it be the board room or a “technical” conference; and whether this is as simple as reporting the transistor drive current or a sophisticated PPA analysis. Almost never the underlying assumptions are disclosed or questioned. The audience are, however, left with an easy-to-remember take-away message that moving forward is viewed as an unquestionable fact. I am in particular paranoid to any number that is normalized or reported as a figure-of-merit (FOM) that is fabricated just to serve the presenter’s point.

Process Simplification
The additional substrate cost can be compensated by simplifying the process flow. In fact, this was the strategy STMicroelectronics followed in defining their 28nm FDSOI technology [1]. Given the fact their digital market is very cost-sensitive, they needed a technology that is competitive compared to their own 28LP bulk technology (a HKMG technology without strain elements). To do so, they opted for implanted raised S/D structure as opposed to in-situ doped dual epitaxy process that we had developed earlier [2]. Unlike bulk technologies there is no need to separate masks to form S/D extensions and deep S/D junctions (and there is no halo either). So, two masks are saved here. Further, instead of 4 different Vt’s in typical 28nm technologies, they used only 2 Vt’s per transistor. These two are in fact generated by swapping the well polarity under the transistor (e.g., NFETs on n-well and p-well will be LVT and RVT, respectively).

This would be an additional mask saving of 6. A mask is needed to form hybrid bulk regions, but the channel SiGe mask that is used in their bulk technology is no longer needed. The net result is a mask reduction of 8. With my rule-of-thumb cost of $75 per mask, the net wafer cost is actually $300 (~10%) less than the bulk 28LP technology. As I emphasized above, there is no need to build a model, make a guess, etc. All matters is the price Samsung quotes you on their 28LPP vs. 28FD offering. Of course, dropping the available threshold voltages should not harm the flexibility in the SOC design in any way. The 2-Vt FDSOI technology should cover the same Vt range covered by the 4-Vt bulk technology. This is achieved by a combination of body biasing and using wider range of gate length that is possible in the FDSOI technology [1].

There is an additional and somewhat hidden advantage in simplifying the process, and I am not talking about yield which depends on the maturity of the process and stability of the line more than anything. By dropping 8 masks (out of a hypothetical 40-mask process) the cycle time is reduced by roughly 20%. With the same WIP, the throughput is increased by 20%.

This is as if some of the wafer processing at the foundry is pushed upstream to the SOI wafer vendor. While this has certain implications on the supply chain, the real advantage is in the development phase. Any fabless company that has paid the foundry extra to run their development lots on a hot track, knows the importance of running the wafers faster. In fact, simplifying the process and pushing the wafers as fast as possible through the line (thanks to KC, BD, and the staff at the IBM Albany facility), was the trick behind the FDSOI success despite limited support.

Smaller Die Size

There are many ways to take advantage of a technology to reduce dies size:

  • Density-friendly design rules: In one of my earlier posts, I explained how design rules can significantly affect the density. While I do not have access to 28nm FDSOI PDK and could not be specific even if I did, one can imagine some of the possibilities in the technology: bidirectional gate (poly) lines, no proximity effect when gate length is used to modulate Vt as opposed to channel doping, no well proximity effect in dense SRAM (0.12 μm2) that can be used to make it even smaller, and no limit on the maximum gate length that exists in RMG process.
  • Smaller logic transistors width: A typical strategy in low-cost applications is to use libraries with smaller transistor width (smaller number of metal tracks). There is a lower limit to this as I briefly touched upon earlier in order to maintain efficient routing. A minimum of 6 tracks is a common practice in more mature nodes for low cost application. The maximum attainable performance somewhat degrades when narrower transistors are used (not linearly, as both FEOL and BEOL capacitance components also shrink). However, with higher drive current of FDSOI compared to bulk devices, especially at lower voltages, it is possible to maintain a reasonably high performance.
  • Smaller SRAM: Record low Vt mismatch is offered by FDSOI devices due to the absence of random dopant fluctuation (RDF) [3]. As a result, dense SRAM cells can be used without degrading noise margins. Higher drive current is still needed to meet access time targets. The higher drive current, as argued above, compared to base bulk technology can be used to satisfy this requirement. Note that even though FinFET technology offers higher drive current compared to planar bulk technology, SRAM devices are still relatively heavily doped and suffer from RDF. One may argue that doping level is somewhat lower compared to bulk planar, but to meet higher Vt requirements of SRAM doping is still needed. In addition, width quantization in FinFET requires further Vt tuning that needs extra doping in the pass-gate NFETs.
  • Smaller analog transistors: Analog transistors typically need longer gate length to achieve desired transistor gain (gm/gds). Since the transistor gain is higher in FDSOI devices compared to bulk planar MOSFETs [3], shorter gate length can be used to deliver a desired gain. With higher drive current per unit width, transistor width can be scaled down while maintaining the speed requirements. Of course, superior transistor matching is needed to support shrinking of both W and L. Local transistor matching is guaranteed by the absence of RDF, while global matching and drift can be obtained by proper body biasing in FDSOI.

I used the above examples for illustration purpose only and to provoke interested minds. The actual area scaling expected from the above examples and similar tricks depends on the application, design style, and actual PDK numbers. As an example, a Cortex-A53 implemented in FDSOI offered 30% area scaling (from 1.7mm2 to 1.3mm2) at the same speed (>1.2GHz) compared to 28nm bulk technology [4]. This is equivalent to 40% reduction in cost per die when combined with the expected 10% reduction in the wafer cost. Take into account 25% reduction in active power at constant speed, and you get the equivalent of scaling the technology by one node (which by the way never delivered by 20nm bulk technology).

To be continued …


References:

[LIST=1]

  • N. Planes, et al., Symp. VLSI Tech., p. 133, 2012.
  • K. Cheng, et al., Symp. VLSI Tech., p. 212, 2009.
  • K. Cheng, et al., IEDM, p. 49, 2009.
  • R. Martino, The Shanghai FDSOI Forum, 2015.

Related Blog

Share this post via:

Comments

0 Replies to “FDSOI Cost Analysis – Part I”

You must register or log in to view/post comments.