Design Technology Co-Optimization for TSMC’s N3HPC Process

Design Technology Co-Optimization for TSMC’s N3HPC Process
by Tom Dillinger on 11-02-2021 at 8:00 am

N3HPC performance comparison

TSMC recently held their 10th annual Open Innovation Platform (OIP) Ecosystem Forum.  An earlier article summarized the highlights of the keynote presentation from L.C. Lu, TSMC Fellow and Vice-President, Design and Technology Platform, entitled “TSMC and Its Ecosystem for Innovation” (link).

One of the topics that L.C. discussed was the initiatives that TSMC pursued for the N3 process node, specifically for the High-Performance Computing (HPC) platform.  This article provides more details about the design-technology co-optimization (DTCO) activities that resulted in performance gains for N3HPC, compared to the baseline N3 process.  These details were provided by Y.K. Cheng, Director, Design Solution Exploration and Technology Benchmarking, in his presentation entitled “N3 HPC Design and Technology Co-Optimization”. 

Background

Design technology co-optimization refers to a cooperative effort among process development engineering and circuit/IP design teams.  The technology team optimizes the device and lithography process “window”, typically using TCAD process simulation tools.  At advanced nodes, the allowed lithographic variability in line widths, spacings, uniformity, and density (and density gradient) is limited – technology optimization seeks to define the nominal fabrication parameters where the highly-dimensional statistical window maintains high yield.  The circuit design team(s) evaluate the performance impacts of different lithographic topologies, extracting and annotating parasitic R and C elements to device-level netlist models.

A key element to DTCO is pursued by the library IP team.  The standard cell “image” defines the allocated (vertical) dimension for nFET/pFET device widths and the number of (horizontal) wiring tracks available for intra-cell connections.  The image also incorporates a local power distribution topology, with global power/ground grid connectivity requirements.

In addition to the library cell image, the increasing current density in the scaled metal wires at advanced nodes implies that DTCO includes process litho and circuit design strategies for contact/via connectivity.  As the design variability in contact/via sizes is extremely limited due to litho/etch uniformity constraints, the process and circuit design teams focus on optimization of multiple, parallel contacts/vias and the associated metal coverage.

And, a critically important aspect of DTCO is the design and fabrication of the SRAM bitcell.  Designers push for aggressive cell area lithography, combined with device sizing flexibility for sufficient read/write noise margins and performance (with a large number of dotted cells on the bitlines).  Process engineers seek to ensure a suitable litho/etch window, and concurrently must focus on statistical tolerances during fabrication to support “high-sigma” robustness.

The fact that TSMC enables customers with foundation IP developed internally provides a tight DTCO development feedback loop.

N3HPC DTCO

Y.K. began his presentation highlighting the N3HPC DTCO results, using the power versus performance curves shown in the figure below.  (The reference design block used for these comparisons is from an Arm A78 core;  the curves span a range of supply voltages, at “typical” device characteristics.)

The collective set of optimizations provide an overall 12% performance boost over the baseline N3 offering.  Note that (for the same supply voltage) the power dissipation increases slightly.

Y.K. went into detail on some of the DTCO results that have been incorporated into N3HPC.  Note that each feature results in a relatively small performance gain – a set of (consistent) optimizations is needed to achieve the overall boost.

  • larger cell height

Wider nFET and pFET devices within a cell provide greater drive strength for the (high-fanout) capacitive loads commonly found in HPC architectures.

  • increase in contacted poly pitch (CPP)

A significant parasitic contribution in FinFET devices is the gate-to-source/drain capacitance (Cgd + Cgs) – increasing the CPP increases the cell area (and wire lengths), but reduces this capacitance.

  • increased flexibility in back-end-of-line (BEOL) metal pitch (wider wires), with corresponding larger vias, as illustrated below
  • high-efficiency metal-insulator-metal (MiM) decoupling capacitor topology

The MiM capacitor cross-section illustrated below depicts three metal “plates” (2 VDD + 1 VSS) for improved areal efficiency over 2-plate implementations.

Improved decoupling (and less parasitic Rin to the capacitor) results in less supply voltage “droop” at the switching activity typically found in HPC applications.

  • double-height cells

When developing the cell image, the library design team is faced with a tradeoff between cell height and circuit complexity.  As mentioned above, a taller cell height allows for more intra-cell wiring tracks to connect complex multi-stage and/or high fan-in logic functions.  (The most demanding cell layout is typically a scannable flip-flop.)  Yet, a larger cell height used universally throughout the library will be inefficient for many gates.

The DTCO activities for N3HPC led TSMC to adopt a dual-height library design approach.  (Although dual-height cells have been selectively employed in earlier technologies, N3HPC adopted more than 400 new cells.)  This necessitated extensive collaboration with EDA tool suppliers, to support image techfile definition, valid cell placement rules, and auto-place-and-route algorithms that would successfully integrate single- and double-height cells within the design block.  (More on EDA tool features added for N3HPC shortly.)

As part of the N3HPC library design, Y.K. also highlighted that the device sizings in multi-stage cells were re-designed for optimized PPA.

  • auto-routing features

Timing-driven routing algorithms have leveraged the reduced R*C/mm characteristics of upper metal layers by “promoting” the layer assignment of critical performance nets.  As mentioned above, the N3HPC DTCO efforts have enabled more potential BEOL metal wire lithography width/spacing patterns.

As shown below, routing algorithms needed enhancements to select “non-default rules” (NDRs) for wire width/spacing.  (The use of NDRs have been available for quite a while – typically, these performance-critical nets were routed first, or often, manually pre-routed. The N3HPC DTCO features required extending NDR usage as a general auto-route capability.)  The figure also depicts how via pillar patterns need to be inserted to support increased signal current.

For lower metal layers where the lithography rules are strict and NDRs are not an option, routing algorithms needed to be enhanced to support parallel track routing (and related via insertion), as shown above.

EDA Support

To leverage many of these N3HPC DTCO features, additional EDA tool support was required.  The figure below lists the key tool enhancements added by the major EDA vendors.

Summary

TSMC has made a commitment to the high-performance computing platform, to provide significant performance enhancements as part of an HPC-specific process offering.  A set of DTCO projects were pursued for N3HPC, providing a cumulative 12% performance gain on a sample Arm core design block.  The optimizations spanned a range of design and process lithography window characteristics, from standard cell library design to BEOL interconnect options to MiM capacitor fabrication.  Corresponding EDA tool features – especially for auto-place-and-route – have been developed in collaboration with major EDA vendors.

For upcoming process node announcements – e.g., N2 – it will be interesting to see what additional DTCO-driven capabilities are pursued for the HPC offering.

-chipguy

Also read: Highlights of the TSMC Open Innovation Platform Ecosystem Forum


Highlights of the TSMC Open Innovation Platform Ecosystem Forum

Highlights of the TSMC Open Innovation Platform Ecosystem Forum
by Tom Dillinger on 11-01-2021 at 8:00 am

N3 comparison

TSMC recently held their 10th annual Open Innovation Platform (OIP) Ecosystem forum.  The talks included a technology and design enablement update from TSMC, as well as specific presentations from OIP partners on the results of recent collaborations with TSMC.  This article summarizes the highlights of the TSMC keynote from L.C. Lu, TSMC Fellow and Vice-President, Design and Technology Platform, entitled: “TSMC and Its Ecosystem for Innovation”.  Subsequent articles will delve more deeply into specific technical innovations presented at the forum.

TSMC OIP and Platform Background

Several years ago, TSMC defined four “platforms”, to provide specific process technology and IP development initiatives aligned with the unique requirements of the related applications.  These platforms are:

  • High-Performance Computing (HPC)
  • Mobile (including, RF-based subsystems)
  • Automotive (with related AEC-Q100 qualification requirements)
  • IoT (very low power dissipation constraints)

L.C.’s keynote covered the recent advances in each of these areas.

OIP partners are associated with five different categories, as illustrated in the figure below.

EDA partners develop new tool features required to enable the silicon process and packaging technology advances.  IP partners design, fabricate, and qualify additional telemetry, interface, clocking, and memory IP blocks, to complement the “foundation IP” provided by TSMC’s internal design teams (e.g., cell libraries, general purpose I/Os, bitcells).  Cloud service providers offer secure computational resources for greater flexibility in managing the widely diverse workloads throughout product design, verification, implementation, release, and ongoing product engineering support.  Design center alliance (DCA) partners offer a variety of design services to assist TSMC customers, while value chain aggregation (VCA) partners offer support for test, qualification, and product management tasks.

The list of OIP partners evolves over time – here is a link to an OIP membership snapshot from 2019.  There have been quite a few recent acquisitions, which has trimmed the membership list.  (Although not an official OIP category, one TSMC forum slide mentioned a distinct set of “3D Fabric” packaging support partners – perhaps this will emerge in the future.)

As an indication of the increasing importance of the OIP partner collaboration, TSMC indicated, “We are proactively engaging with partners much earlier and deeper (my emphasis) than ever before to address mounting design challenges at advanced technology nodes.”       

Here are the highlights of L.C.’s presentation.

N3HPC

In previous technical conferences, TSMC indicated that there will be (concurrent) process development and foundation IP releases focused on the HPC platform for advanced nodes.

The figures below illustrate the PPA targets for the evolution of N7 to N5 to N3.  To that roadmap, TSMC presented several design technology co-optimization (DTCO) approaches that have been pursued for the N3HPC variant.  (As has been the norm, the implementation of an ARM core block is used as the reference for the PPA comparisons.)

Examples of the HPC initiatives include:

  • taller cells, “double-high” standard cells

N3HPC cells adopt a taller image, enabling greater drive strength.  Additionally, double-high cells were added to the library.  (Complex cells often have an inefficient layout, if confined to a single cell-height image – although double-high cells have been used selectively in previous technologies, N3HPC adopts a more diverse library.)

  • increasing the contacted poly pitch (CPP)

Although perhaps counterintuitive, increasing the cell area may offer a performance boost by reducing the Cgs and Cgd parasitics between gate and S/D nodes, with M0 on top of the FinFET.

  • an improved MiM decoupling capacitance layout template (lower parasitic R)
  • greater flexibility – and related EDA auto-routing tool features – to utilize varied (wider width/space) pitches on upper-level metal layers)

Traditionally, any “non-default rules” (NDRs) for metal wires were pre-defined by the PD engineer to the router (and often pre-routed manually);  the EDA collaboration with TSMC extends this support to decisions made automatically during APR.

Note in the graph above that the improved N3HPC performance is associated with a slight power dissipation increase (at the same VDD).

N5 Automotive Design Enablement Platform (ADEP)

The requirements for the Automotive platform include a more demanding operating temperature range, and strict reliability measures over an extended product lifetime, including:  device aging effects, thermal analysis including self-heating effects (SHE), and the impact of these effects on electromigration failure.  The figure below illustrates the roadmap for adding automotive platform support for the N5 node.

Cell-aware internal fault models are included, with additional test pattern considerations to reduce DPPM defect escapes.

RF

RF CMOS has emerged as a key technology for mobile applications.  The figure below illustrates the   process development roadmap for both the sub-6GHz and mmWave frequency applications.  Although N16FFC remains the workhorse for RF applications, the N6RF offering for sub-6GHz will enable significant DC power reduction for LNAs, VCOs, and power amplifiers.

As for the Automotive platform, device aging and enhanced thermal analysis accuracy are critical.

N12e sub-Vt operation

A major initiative announced by L.C. related to the IoT platform.  Specifically TSMC is providing sub-Vt enablement, reducing the operating supply voltage below device Vt levels.

Background – Near-Vt and Sub-Vt operation

For very low power operation, where the operating frequency requirements are relaxed (e.g., Hz to kHz), technologists have been pursuing aggressive reductions in VDD – recall that active power dissipation is dependent upon (VDD**2).

Reducing the supply to a “near-Vt” level drops the logic transition drive current significantly;  again, the performance targets for a typical IoT application are low.  Static CMOS logic gates function at near-Vt in a conventional manner, as the active devices (ultimately) operate in strong inversion.  The figure below illustrates the (logarithmic) device current as a function of input voltage – note that sub-Vt operation implies that active devices will be operating in the “weak inversion” region.

Static, complementary CMOS gates will still operate correctly at sub-Vt levels, but the exponential nature of weak inversion currents introduces several new design considerations:

  • beta ratio

Conventional CMOS circuits adopt a (beta) ratio of Wp/Wn to result in suitable input noise rejection and balanced RDLY/FDLY delays.  Commonly, this ratio is based on the strong inversion carrier mobility differences between nFET and pFET devices.  Sub-Vt circuit operation depends upon weak inversion currents, and likely requires a different approach to nFET and pFET device sizing selections.

  • sensitivity to process variation

The dependence of the circuit behavior on weak inversion currents implies a much greater impact of (local and global) device process variation.

  • high fan-in logic gates less desirable

Conventionally, a high ratio of Ion/Ioff is available to CMOS circuit designers, where Ioff is the leakage current through inactive logic branches.  In sub-Vt operation, Ion is drastically reduced;  thus, the robustness of the circuit operation to non-active leakage current paths is less.  High fan-in logic gates (with parallel leakage paths) are likely to be excluded.

  • sub-Vt SRAM design considerations

In a similar manner, the leakage paths present in an SRAM array are a concern, both for active R/W cell operation and inactive cell stability (noise margins).  In a typical 6T-SRAM bitcell, with multiple dotted cells on a bitline, leakage paths are present through the access transistors of inactive word line rows.

A read access (with pre-charged BL and BL_bar) depends on a large difference in current on the complementary bitlines through only the active word line row array locations.  In sub-Vt operation, this current difference is reduced (and also subject to process variations, as SRAMs are often characterized to a high-sigma tail of the statistical distribution curve).

As a result, the number of dotted cells on a bitline would be extremely limited.  The schematic on the left side of the figure below illustrates an example of a modified (larger) sub-Vt SRAM bitcell design, which isolates the read operation from the cell storage.

  • “burst mode” operation for IoT

IoT applications may have very unique execution profiles.  There are likely long periods of inactivity, with infrequent “burst mode” operations requiring high performance for a short period of time.  In conventional CMOS applications, the burst mode duration is comparatively long, and a dynamic-voltage frequency-scaling (DVFS) approach is typically employed by directing a DC-to-DC voltage regulator to adjust its output.  The time required for the regulator to adapt (and the related power dissipation associated with the limited regulator efficiency) are rather inconsequential for the extended duration of the typical computing application in burst mode.

Such is not the case for IoT burst computation, where power efficiency is utmost and the microseconds required for the regulator to switch is problematic.  The right hand side of the figure above depicts an alternative design approach for sub-Vt IoT CMOS, where multiple supplies are distributed and switched locally using parallel “sleep FETs” to specific blocks.  A higher VDD would be applied during burst mode, returning to the sub-Vt level during regular operation.

TSMC is targeting their initial sub-Vt support to the N12e process.  The figure below highlights some of the enablement activities pursued to provide this option for the IoT platform.

TSMC hinted that the N22ULL process variant will also receive sub-Vt enablement in the near future.

L.C. also provided an update on the TSMC 3D Fabric advanced packaging offerings – look for a subsequent article to review these technologies in more detail.

Summary

TSMC provided several insights at the recent OIP Ecosystem forum:

  • HPC-specific process development remains a priority (e.g., N3-HPC).
  • The Automotive platform continues to evolve toward more advanced process nodes (e.g., N5A), with design flow enhancements focused on modeling, analysis, and product lifetime qualification at more stringent operating conditions.
  • Similarly, the focus on RF technology modeling, analysis, and qualification continues (e.g., N6RF).

and, perhaps the most disruptive update,

  • The IoT platform announced enablement for sub-Vt operation (e.g., N12e).

-chipguy

Also read: Design Technology Co-Optimization for TSMC’s N3HPC Process


TSMC Arizona Fab Cost Revisited

TSMC Arizona Fab Cost Revisited
by Scotten Jones on 10-13-2021 at 8:00 am

TSMC North America Fabs

Back in May of 2020 I published some comparisons of the cost to run a TSMC fab in Arizona versus their fabs in Taiwan. I found the fab operating cost based on the country-to-country difference to only be 3.4% higher in the US and then I found an additional 3.8% because of the smaller fab scale. Since that time, I have continued to encounter reports that the US fab costs are approximately 30% higher than countries in Asia. In the studies I have found, most of the cost difference is attributed to “incentives” without a clear explanation of what the incentives are. My calculation does not include incentives but still the size of the difference led me to completely reexamine my assumptions and look into incentives, what they could be and how they would impact the costs I calculate.

Profit and Loss

At the highest-level companies are judged by their Profit and Loss (P&L) and I decided to go through a simple P&L line by line and look at every country-to-country difference that could impact the bottom line profitability.

A P&L is summarized on an income statement, a simple income statement is:

  1. Revenue – the money received from selling the product
  2. Cost of Goods Sold (COGS) – the direct costs to produce the product being sold. This is what our Models calculate.
  3. Gross Margin = Revenue – COGS. For wafer sale prices we estimate gross margin and apply it to the wafer cost.
  4. Period expenses – Research and Development expenses (R&D), Selling, General and Administration expenses (SG&A) and other expenses.
  5. Operating Income = Gross Margin – Period Expenses
  6. Income Before Tax = Operating Income – Interest and Other
  7. Net Income = Income Before Tax – Tax (tax is based on Income Before Tax)

We can then go through this line by line to look at country by country differences. These line numbers will be referenced below in bold/italics.

For a cost evaluation line 1. Is irrelevant.

Line 2. (COGS) Is a key differentiator.

Cost of Goods Sold

In our Models we break out wafer cost categories as follows:

  • Starting Wafer
  • Direct labor
  • Depreciation
  • Equipment Maintenance
  • Indirect Labor
  • Facilities
  • Consumables

Starting wafers – our belief is that starting wafers are globally sourced and the country where they are purchased does not impact the price. This has been confirmed in multiple expert interviews including by wafer suppliers.

Direct Labor (DL) – all our Models have DL rates by country and year for 24 countries. In 2021 the difference in labor rate from the least expensive to most expensive country was 21x! For each wafer size and product type we have estimates of labor hours required and we calculate the direct labor cost. We believe this calculation accurately reflect cost differences between countries in all our Models. It should be noted here, that leading edge 300mm wafer fabs are so highly automated that there are very few labor hours in the process and even with a huge labor rate difference, the percentage impact on wafer cost is small.

Depreciation – this is the most complex category. The capital cost to build a wafer fab is depreciated over time to yield with the depreciation amount charged off to the P&L.

We break out the capital cost to build a facility into:

  1. Equipment – we believe equipment is globally sourced and the cost is basically the same in any country. We did get one input that the US cost are slightly higher due to import costs, but we don’t believe this is significant.
  2. Equipment Installation – install costs in our Models are based on equipment type with different costs assigned to; inspection and metrology equipment, lithography equipment, and other equipment types (ALD, CVD, PVD, etc.). What we have found in our interviews is the costs vary by country with the variation being different for the different categories. For example, inspection and metrology equipment installation is heavily weighted toward electrical work that varies in cost between countries. Other equipment is more heavily weighted toward process hookups that are less country dependent. Lithography equipment is intermediate between the two.
  3. Automation – we believe automation is globally sourced and does not change in cost between countries although we are still checking on this assumption.
  4. Building – in the past we assumed that building costs were the same by country believing the major components were globally sourced. In our expert interviews we found there is a significant difference in cost per country. Revisiting fab construction costs we have in our databases also found differences after accounting for product types. Our latest Strategic Cost and Price Model fully accounts for these differences.
  5. Building Systems – as with the building we assumed building systems were globally sourced and the cost didn’t vary by country, but this only partially true. Our latest Strategic Cost and Price Model fully accounts for these differences.
  6. Capital Incentives – if a company receives government grants to help pay for the investments to build a wafer fab, they will impact the actual capital outlay for the company building the Fab. In the past we have not accounted for this, we now allow capital incentives to be entered into the model.

Our models all calculate the capital investment by fab using a detailed bottoms-up calculation. The equipment, equipment installs, and automation are then depreciated over five years, the building systems over ten years and the building over fifteen years. We use these default values because most companies use these lifetimes for reporting purpose. There are lifetimes by country differences for tax purposes, but taxes and reporting values are typically calculated separately. There are some companies that don’t use five years for equipment but to enable consistent comparison between fabs we use five years as a default, although the ability to change the lifetimes is built into many of our Models.

Equipment Maintenance – equipment maintenance costs include consumable items, repair parts and service contracts. The technicians and engineers that maintain equipment at a company are accounted for in the Indirect Labor Cost described below.

In our Strategic Cost and Price Model the country differences are accounted for as follows:

  1. Consumables – we continue to believe this is the same by country but there are company to company differences. For example, an etcher has quartz rings in the etch chamber that some companies source from the Original Equipment Manufacturers and other companies may source in the secondary market at lower cost.
  2. Repair Parts – repair parts are distinct from consumables in that they aren’t expected to normally wear out during operation. We believe these are globally sourced and don’t vary in cost by country.
  3. Service Contracts – we believe there is some difference in service contract costs due to labor rate differences.

Our latest Strategic Cost and Price Model fully accounts for these differences.

Indirect Labor (IDL) – IDL is made up of engineers, technicians, supervisors and managers, our Models have engineer salaries by country for twenty-four countries by year and ratios are used to calculate the technician, supervisor, and manager salaries. Difference in engineer salaries vary by 12x between the lowest cost and highest cost countries. For each process/fab being modeled we look at the IDL hours required for the process and break out the IDL hours between the four IDL categories. We believe all our Models correctly reflect country to country differences currently. As with DL costs, IDL costs have less impact on wafer cost than you might expect but are more significant than DL costs.

Facilities – we break out facilities into Ultrapure Water, Water and Sewer, Electric, Natural Gas, Communications, Building Systems Maintenance, Facility Occupancy, and Insurance. The main costs are Electric, Natural Gas, Building Systems Maintenance, and Insurance. Our Models all account for Electric and natural gas rates by country for twenty-four countries. Electrical rates vary by 2.8x by country and natural gas by 7.6x by country and both are fully accounted for in the models. Facility system maintenance and facility occupancy also vary by country. Our latest Strategic Cost and Price Model fully accounts for these differences.

Consumables – all our Models calculate consumables in varying degrees of detail. We believe materials are sourced globally and do not vary in price by country. There are some country-to-country tariff differences but the implementation of this is so complex and constantly changing that we do not model. It. We do not believe the impact is significant.

Profit and Loss – Continued

 Line 3 – Gross Margin

Gross Margin isn’t part of a COGS discussion but many of our customers buy wafers from foundries. Foundry wafer prices are Wafer Cost + Foundry Margin and we have put significant effort into providing Foundry Margin guidance in our models. Foundry Margins in our Models vary company to company and within a company by year and quarter, purchasing volume and process node. They are not country dependent.

Line 4 – Period Expenses

 Not relevant to a wafer cost discussion

Line 5 Operating Income

Not relevant to a wafer cost discussion

There are two other places in the P&L where we may see country-to-country impact.

Line 6 Income Before Taxes

if a government offers a company low-cost loan this would reduce interest expenses in the interest line. In my opinion low-cost loans are incentives.

Line 7 Net Income

Tax – there are two pieces to the tax line, one is country-to-country tax rate differences and the other is preferential tax rates. In my opinion tax rate differences are a structural difference whereas a preferential tax rate is an incentive. For example, the corporate tax rate in the US is 25.8% and in Taiwan is 20%. These tax rates are normally applied to Income Before Taxes.

In summary we see country to country operating cost differences and our current release of our Strategic Cost and Price Model models these differences accurately and in detail.

There is also country to country tax rate differences that we don’t model because they are below the COGS line.

Finally, there are incentives, we see these as having three parts:

  1. Capital grants that would reduce capital cost and therefore depreciation in COGS.
  2. Low-rate loans that would impact interest expenses.
  3. Tax incentives, investment, R&D and other tax reductions.

TSMC Arizona Fab

Having reviewed all the elements of wafer cost difference we can now investigate how TSMC’s cost in Arizona will match up to their cost in Taiwan.

TSMC currently produces 5nm wafers in Fab 18 – phases 1, 2, and 3 in Taiwan. We believe each phase is current running 40,000 wafer per month (wpm) with plans to ramp to 80,000 wpm per phase over the next two years. In contrast the Arizona fab is planned to produce 20,000 wpm (at least initially). This will lead to three differences in costs:

  1. Country to country operating cost difference – after accounting for all the operating cost differences, we now find a 7% increase in cost to operate in Arizona versus Taiwan. We find a higher difference than we did previously due to now including some factors we had previously missed. Having reviewed a P&L line by line and consulting with a wide range of experts we do not believe there are any missing parts to this analysis. An interesting note here is direct labor cost in the US are over 3x the rate in Taiwan, but they have only minimal impact because in Taiwan direct labor is only 0.1% of the wafer cost and even tripling or quadrupling the labor rate it is still less than 1% of the wafer cost. Utility costs are the other hand are lower in the US.
  2. Fab size differences – accounting for a 20,000 wpm fab in the US versus 80,000 wpm in Taiwan, plus the efficiency of clustering multiple fabs together in Taiwan adds 10% to the country-to-country difference found in 1. For a total 17% difference. We want to highlight that the 10% additional cost is due to TSMC’s decision to build a small fab in the US. We expect the initial Arizona cleanroom to have room to ramp up to more than 20,000 wpm and the site to have room for additional cleanrooms. Over time if TSMC ramps up and expands the site the 10% difference can be reduced or eliminated.
  3. Incentives – to the best of my knowledge Taiwan does not offer direct capital grants. To the best of my knowledge Taiwan does not offer low-cost loans. In the past Taiwan offered tax rebates for capital investment in fabs but my understand is this program has ended. There are R&D tax rebates available, and Taiwan has a lower corporate tax rate than the US (although this isn’t an “incentive” in my view). To investigate the tax advantage for TSMC in Taiwan versus the US I have compared TSMC’s effective tax rate over the last three years to Intel’s effective tax rate in the last three years. Surprisingly they aren’t that different, now I know there is a lot of complex financial engineering in Taxes, but it is the best comparison I can find. TSMC ‘s tax rate for 2018, 2019 and 2020 is 11.7%, 11.4% and 11.4% respectively. Over the same period Intel’s tax rate was 9.7% (one-time benefits) in 2018, 12.5%, in 2019, and 16.7% (NAND Sale) in 2020. So over three years TSMC paid 11.5% and Intel paid 13.1% as a tax rate which isn’t that different.

Conclusion

The bottom line to all this is the cost for TSMC to make wafers in the US is only 7% higher than Taiwan if they built the same size fab complex in the US as what they have in Taiwan. Because they are building a smaller Fab complex the cost will be 17% higher but that is due to TSMC’s decision to build a smaller fab, at least initially.

I do want to point out this doesn’t mean the US is not at a bigger cost disadvantage versus any other country. India has reportedly discussed providing 50% of the cost of a fab as part of an attempt to get Taiwanese companies to set up a fab in India. At least in the past the national and regional governments in China have offered large incentives. Israel has also provided significant incentives to Intel in the past. But under current conditions a US fab is only 7% more expensive than a fab in Taiwan if all factors other than the location are the same.

Also Read:

Intel Accelerated

VLSI Technology Symposium – Imec Alternate 3D NAND Word Line Materials

VLSI Technology Symposium – Imec Forksheet


ASML is the key to Intel’s Resurrection Just like ASML helped TSMC beat Intel

ASML is the key to Intel’s Resurrection Just like ASML helped TSMC beat Intel
by Robert Maire on 09-09-2021 at 6:00 am

TSMC INTEL ASML Hurricane 1

-Intel’s access to high-NA EUV tools may be their elixir of life
-TSMC’s EUV adoption helped it vault faltering Intel & Samsung
-Maybe ASML should invest in Intel like Intel invested in ASML
-Shoe is on the other foot- But cooperation helps chip industry

Intel is dependent upon ASML for its entire future
If Intel has any hope of recapturing the lead in the Moore’s Law race from TSMC then it desperately needs ASML’s help. Right now TSMC is miles ahead of Intel in EUV tool count and experience which is the key to advanced technology nodes. If both TSMC and Intel buy tools and technology at an equal rate, TSMC will stay ahead. The only other way for Intel to catch TSMC is for TSMC to fall on its own sword, much as Intel did, but we don’t see that happening any time soon.

Introduction of high-NA EUV is the next inflection point for Intel
Just as EUV was an inflection point that vaulted TSMC.

Back when ASML was struggling with EUV, making slow progress on a questionable technology, they were looking for an early adopter to take the plunge and convince the industry that EUV was real.

At the time, Samsung, TSMC and Intel were not signed up to EUV and viewed it very suspiciously. Nobody was willing to be the first to commit to it.

TSMC had famously said they would never do EUV
Then Apple changed all that by telling TSMC it needed to do EUV, for better chip performance, and Apple would write a check for it.

ASML got into a room with TSMC management and cut a deal and TSMC went from an EUV non-believer to a full on convert virtually over night. TSMC went from “never EUV” to its biggest customer and user (financed by Apple)
The rest is history.

TSMC’s earlier adoption of EUV helped it pull ahead of both Intel and Samsung over the past few years, aided by Intel’s production problems.

Its likely that TSMC may have pulled ahead without EUV but EUV really allowed TSMC to accelerate away from Intel and Samsung and create a huge Moore’s Law lead that exists today.

There is another similar inflection point coming up in the industry today, its high-NA EUV, basically the second generation of EUV technology. Similar to the first round of EUV there is hesitation in the industry as chip makers are unsure of the need for high-NA or its advantages or even whether it will arrive in time to make a difference.

ASML needs another early adopter to push the industry along.
Indeed, the IMEC roadmap, which most in the industry seem to be following does not call out the need for high-NA EUV.

Obviously there was some behind the scenes discussion between ASML & Intel as Intel came out with a full throated support of high-NA EUV technology.
If ASML anoints Intel as the high-NA EUV champion in exchange for its commitment and Intel gets preferential access to tools over TSMC as its reward, that could be the difference to get Intel back in the Moore’s Law game ahead of TSMC.

Not a slam dunk
There is of course a lot of risk but then again Intel has to take the risk as it has little choice. Will high-NA work? Will it be demonstrably better than current EUV? Will it get here in time? Will it be enough of an advantage over TSMC?

If the answer to enough of these questions is yes then Intel could win big, if not Intel could remain in a trailing position and never catch TSMC.

Intel of course has to do a lot of other things right, such as new transistor design and vertically stacked transistors but little of that will matter if they can’t get back in the Moore’s Law game with leading edge litho.

Maybe Intel should go from “Investor” to “Investee”
Back in 2012 ASML was struggling with EUV and needed some financial help to complete the technology and show support of customers. Customers were also pushing hard for 450MM wafer tools and demanding DUV tools, so ASML had its hands full, much like Intel today. It needed help in the form of money.

Intel, Samsung & TSMC each invested substantial sums in ASML. Intel invested and owned 15% of ASML, TSMC 5% and Samsung 3%.

All three companies made a killing in ASML stock as they sold after ASML’s stock ran up on EUV. Intel made enough to buy all the EUV tools it needed. Intel’s profits on its ASML investment helped prop up its weak performance.

It was a great deal for ASML and Intel, TSMC & Samsung, a true win/win which helped the industry adoption of EUV.

It would seem that now the shoe is on the other foot. ASML is on fire and Intel is in need of help. ASML has a 50% higher market cap than Intel.

Intel has a lot to do, a lot to prove and a lot of money to spend to recapture the lead in semiconductors. In short, Intel needs help.

Maybe ASML should invest in Intel much as Intel invested in ASML when the chips were down.

If ASML were to invest a similar amount in Intel, it would be enough cash to pay for both planned foundries in Arizona and then some. With enough left over for Intel to buy some expensive high-NA tools.

If it worked, as Intel’s investment in ASML did, ASML might make a killing in Intel’s stock as they regain their Mojo. Not to mention that ASML would get a great customer for high-NA.

This would certainly be better than a US government bailout of Intel’s self inflicted problems which would benefit investors rather than tapping taxpayers. Intel would certainly rather take the “free money” from the government.

Its a nice dream but we doubt that Samsung & TSMC would be happy with ASML investing in Intel.

The better, and somewhat logical solution, would be for Apple to write a check to Intel to be the sponsor for Intel’s high-NA EUV plans and Foundry projects in Arizona in return for first and guaranteed capacity to fab Apple’s chips at those fabs.

It would be great for Apple to have a second source that is US based rather than their total current reliance on TSMC in Taiwan (a short boat ride from the Chinese motherland). It would guarantee supply and keep pricing honest.

Apple certainly has the cash to support Intel as well as the need for another foundry source for leading edge as Samsung is clearly a “Frenemy” and not a great second source to TSMC.

Semiconductors remains very dynamic, global & highly interconnected
The linkage between chip makers and tool makers is much more than a customer supplier relationship. The semiconductor industry is a highly complex and dynamic industry of relationships that is ever changing with Intel going from the leader and “inventor” of Moore’s Law to struggling and ASML going from a distant third against Nikon and Canon to a monopolistic technology leader & powerhouse.

The fact is that relationships in the industry are the key to survival and success and navigating those relationships are key. The relationships are complex and multifaceted between chip makers & customers and tool makers but the reality is that no one can do it on their own and everyone is interdependent for the industry’s success…

The Stocks
We still maintain that Intel has a very long road in front of it with no assurance of success and many challenges. We maintain that Intel will be a “work in progress” for a relatively long time, well beyond most everyone’s investment horizon.

ASML is in an enviable position given its technology dominance and demand for its product. This positive environment will not change any time soon. ASML’s stock is priced for perfection but then again its in a perfect position so its hard to argue.

The semiconductor “shortage” is clearly longer lasting than expected as paranoia in the industry runs deep and everyone continues to double and triple order and stock up on inventory in an industry used to Kan Ban and just in time delivery.

The stocks have clearly slowed over the last few months as investors are rightfully wary of the end of the current “super duper cycle”. It remains difficult to put new money to work at current valuations.

Also Read:

KLA – Chip process control outgrowing fabrication tools as capacity needs grow

LAM – Surfing the Spending Tsunami in Semiconductors – Trailing Edge Terrific

ASML- A Semiconductor Market Leader-Strong Demand Across all Products/Markets


The Arm China Debacle and TSMC

The Arm China Debacle and TSMC
by Daniel Nenni on 09-03-2021 at 6:00 am

Barnum and Baily Circus

Having spent 40 years in the semiconductor industry, many years working with Arm and even publishing the definitive history book “Mobile Unleashed: The Origin and Evolution of ARM Processors in Our Devices” plus having spent more than 20 years working with China based companies, I found the recent Arm China media circus quite entertaining.

While I have zero firsthand information on this situation I do have numerous contacts and have had discussions on the topic. I also have many years of experience with Arm management, enough to know that the Arm China situation as described in the media is complete nonsense.

Rather than rehash the whole fiasco, here are links to one of the inflammatory articles and a retraction, which is quite rare for today’s media. After publishing false information most sites just move on to the next topic leaving the fake news up in spite of the collateral damage. I would guess that Arm made some calls on this one, absolutely.

ARM China Seizes IP, Relaunches as an ‘Independent’ Company [Updated]

ARM Refutes Accusations of IP Theft by Its ARM China Subsidiary

This Arm China false narrative started as most do, with a misread publication and a provocative title with the sole purpose of feeding clicks to the advertising monster within. He didn’t even get the author of the original publication’s name right and that still has not been corrected:

“As Devin Patel reports...” It’s Dylan Patel, he is a SemiWiki member, and he said nothing about “ARM China Seizing IP”.  And by the way it’s Arm not ARM. That name was changed some time ago.

The author of the misfortunate article is a prime example of the problem at hand. While not the worst by any means, he has zero semiconductor education or experience. He does not know the technology, the companies, or the people, yet flocks of sheep come to his site for the latest semiconductor news. Pretty much the same as getting accurate political information from Facebook.

One of the reasons we started SemiWiki ten plus years ago was that semiconductors did not get their fair share of media attention. TSMC was a prime example. Even though they were the catalyst for the fabless semiconductor revolution that we all know and love, very few people knew their name or what they accomplished.

Now the pendulum has completely swung in the other direction with false TSMC narratives running amok. This one is my favorite thus far:

Intel locks down all remaining TSMC 3nm production capacity, boxing out AMD and Apple

And yes that one reverberated throughout the faux semiconductor media even though it was laughably false.

Here are a couple more recent ones that went hand-in-hand:

Taiwan’s TSMC asking suppliers to reduce prices by 15%

TSMC to hike chip prices ‘by as much as 20%’

Imagine the financial windfall here…

The upside I guess is that TSM stock is at record levels as it should be.  There is an old saying, “There is no such thing as bad publicity” (which was mostly associated with circus owner and self-promoter extraordinaire Phineas T. Barnum). The exception of course being your own obituary as noted by famed Irish writer Brendan Behan.

With today’s cancel culture, bad press can be your own obituary which is something to carefully consider before publishing anything, anywhere, at any time. Of course, there is that insatiable click monster that needs to be fed so maybe not.


TSMC Wafer Wars! Intel versus Apple!

TSMC Wafer Wars! Intel versus Apple!
by Daniel Nenni on 08-18-2021 at 10:00 am

Intel TSMC SemiWiki

The big fake news last week came from a report out of China stating that TSMC won a big Intel order for 3nm wafers. We have been talking about this for some time on SemiWiki so this is nothing new. Unfortunately, the article mentioned wafer and delivery date estimates that are unconfirmed and from what I know, completely out of line. From there the media created a frenzy pitting Intel against Apple and AMD in a war of wafers as a desperate attempt to get cheap clicks:

Intel locks down all remaining TSMC 3nm production capacity, boxing out AMD and Apple by John Loeffler Tech Radar

Intel Grabs Majority of TSMC’s 3nm Capacity by Hassan Mujtaba WCCftech

Intel Has Reportedly Cornered TSMC 3nm Chip Capacity by Paul Lilly HotHardWare

Apple secures majority of TSMC’s 3nm production capacity over Intel by Sean Gizmo China

And now we have wanna be influencers on Seeking Alpha and LinkedIn repeating this false narrative ad nauseum.

First let’s look at the TSMC/Apple backstory. Apple came to TSMC from Samsung at 20nm for the iPhone 6 which was the best phone of it’s time in my opinion. Apple first partnered with Samsung when founding the iPhone franchise but switched to TSMC after Samsung came out with their own line of smartphones that competed with their #1 foundry customer. A giant IP theft law suit followed which cemented Apple’s relationship with TSMC because as we all know, “TSMC is the trusted foundry and does not compete with customers”. As the story goes, Apple first approached Intel to make their SoCs but was rebuked, a decision that Intel greatly regrets.

The TSMC/Apple relationship disrupted the semiconductor manufacturing business by introducing what I call process technology half steps. Instead of following Moore’s law with a new process every two to three years TSMC released a new process version every year timed with the Apple iPhone Fall launch. In order to do that TSMC and Apple closely collaborate on a process technology optimized for the Apple SoCs which is frozen at the end of each year for high volume production in the second half.

The first half step was 20nm to 16nm. TSMC 20nm first introduced double patterning which was no small feat for chip designers. Next TSMC added FinFETs (another design challenge) to 20nm creating 16nm. TSMC uses the same fabs for the half steps which saves time and resources and promotes advanced yield learning for smoother process ramping. TSMC 16nm was further optimized for a 12nm version.

TSMC 10nm (N10) was the next process node which was followed by the N7 half step. Partial EUV was added to N7 (N7+) as another half step. N7+ was further optimized for N6.

TSMC N5 followed with more EUV and was further optimized for N4 which is what is in the iPhone products that will be launched next month (Apple’s version of N4).

TSMC N3 was officially launched at the TSMC Technology Symposium 2021 with even more EUV which will be in volume production starting in 2H2022 (Apple). As compared to N5, N3 will provide:

  • +10-15% performance (iso-power)
  • -25-30% power (iso-performance)
  • +70% logic density
  • +20% SRAM density
  • +10% analog density

In the 10 years that TSMC and Apple have been working together an iPhone launch has never been missed and Apple has always been first to the new process technology. This collaborative half step process methodology is the reason TSMC and Apple have executed flawlessly. As a result, Apple is TSMC’s #1 customer and closest partner and I do not see that changing anytime soon, if ever, absolutely.

I first heard word of Intel having the TSMC N3 PDK in the first part of 2020 which was a bit of a surprise. Intel is a long time TSMC customer due to acquisitions but not for native Intel products. I confirmed it with multiple sources inside the ecosystem and started writing about it shortly thereafter.

What I was told later is that Bob Swan signed the N3 deal with TSMC due to the delays in Intel 10nm and 7nm to motivate Intel manufacturing to get those processes out as planned. TSMC then increased CAPEX to build the additional N3 capacity required to satisfy the Intel wafer agreement.

To be clear, wafer agreements are signed 2-3 years before the chip makes it into HVM and TSMC can build fabs faster than that so there will be no N3 shortages for anyone who signed a wafer agreement (apple, AMD, NVIDIA, QCOM, etc…). If they need more chips than what they signed up for, which happens, there may be shortages. This is how TSMC and the foundry business works. It’s all about the wafer agreements.

As an interesting side note, Pat Gelsinger and his new IDM 2.0 push has made the Intel/TSMC relationship all the more interesting. Pat insists that Intel will manufacture the majority of their products internally. I understand that 50.001% is technically a majority but that still seems low given the Intel TSMC N3 wafer agreement, the process delays Intel is currently experiencing, and the competitive pressures of AMD.

We covered the Intel Accelerated event last month and will be covering the Intel Architecture Day as well. Hopefully Intel’s new process and product initiatives are successful because competition is what keeps semiconductor technology moving forward and cost effective.


TSMC Explains the Fourth Era of Semiconductor – It’s All About Collaboration

TSMC Explains the Fourth Era of Semiconductor – It’s All About Collaboration
by Mike Gianfagna on 08-13-2021 at 6:00 am

TSMC Explains the Fourth Era of Semiconductor – Its All About Collaboration

The 32nd VLSI Design/CAD Symposium  just occurred in a virtual setting. The theme of the event this year was “ICs Powering Smart Life Innovation”. There were many excellent presentations across analog & RF, EDA & testing, digital & system, and emerging technology. There were also some excellent keynotes, and this is where I’d like to focus. TSMC’s Suk Lee presented a keynote entitled, “Moore’s Law and the Fourth Era of Semi”.  Anything that attempts to make sense out of the storied and turbulent history of the semiconductor industry catches my attention. As explained by TSMC, the fourth era of semiconductor is all about collaboration. Let’s a take a closer look.

The keynote was presented by Suk Lee, vice president, Design Technology Platform at TSMC. I’ve known Suk for a long time, so this presentation was a must-see for me.  I sold to Suk when I was at Zycad and he was at LSI Logic. That was challenging as Suk is not easily impressed. I then worked with Suk at Cadence where we achieved some great results. His high bar for technical excellence was alive and well and it helped us. Since Cadence, I’ve had a few gigs in companies that were part of the Design Technology Platform Suk oversees at TSMC. Again, meeting his high bar for technical excellence made us all better. Let’s look at the history of the semiconductor industry, according to TSMC and Suk Lee.

The First Era of Semiconductor – IDM

To begin with, the transistor was invented at Bells Labs, followed by the first integrated circuit at Texas Instruments. Things got really interesting when the first monolithic integrated circuit was developed at Fairchild. The photo below, courtesy of the Computer History Museum shows some of the early pioneers involved in this work. You will recognize their names. Note everyone is wearing a jacket and tie. This might be something to think about as you plan your return to the office.

Fairchild pioneers

And so, the first era of semiconductor was born – the Era of the integrated device manufacturer, or IDM. These were monolithic companies that did it all – chip design, design infrastructure, chip implementation and process technology. I started my career in the design infrastructure area at an IDM called RCA. Suk points out that integration and invention went hand-in-hand at IDMs. The opportunity to create something completely new was quite exciting. I know that from first-hand experience. Custom chips were the domain of IDMs. They had all the infrastructure, technology and staff to get it done. And so custom chips were limited to IDMs or companies with enough money to fund the massive development at an IDM. That all changed when we got to the second era of semiconductor.

The Second Era of Semiconductor – ASIC

Companies like LSI Logic and VLSI Technology were the pioneers for this phase. Now, design infrastructure, chip implementation and process technology were provided to all by the ASIC vendor. The semiconductor industry began to disaggregate during this time. Armed with design constraints, a much broader community of engineers could design and build a custom chip. The technology became democratized, and the world was never the same.

The Third Era of Semiconductor – Foundry

The third era is essentially a maturation of the second era. All of the steps in IC design and fabrication are quite challenging. Assembling an ecosystem where each company focuses on their core competence is a great way to manage complexity. This is what happened in the third era. Chip design and implementation were addressed by fabless semiconductor companies, design infrastructure was delivered by EDA companies and process technology was developed and delivered by foundries. TSMC was a key pioneer for this phase.

The Fourth Era of Semiconductor – Open Innovation Platform

Watch carefully, we’re about to come full circle. As the semiconductor industry continued to mature, process complexity and design complexity began to explode. Esoteric and subtle interactions between process technology, EDA, IP and design methodology became quite challenging to coordinate with a disaggregated supply chain. TSMC was the pioneer for this era as well.

The company realized that a substantial amount of coordination and communication was needed between the various parts of the disaggregated ecosystem. A way to bring the various pieces closer together to foster better collaboration was needed. And so, TSMC developed the Open Innovation Platform®, or OIP. They began this work early, when 65 nm was cutting edge. Today, OIP is a robust and vibrant ecosystem.

The infrastructure provided by TSMC paved the way for improved collaboration and coordination, creating a virtual IDM among its members. This provides TSMC’s customers the best of both the monolithic and disaggregated models. It changed the trajectory of the semiconductor industry and provided TSMC with a substantial competitive edge.

There are many benefits of the model. The ability to perform design technology co-optimization (DTCO) is one that is quite useful. The figure below illustrates the breadth of TSMC’s OIP. Advanced semiconductor technology requires a village, a big village.  To help decode some of the acronyms DCA stands for design center alliance and VCA stands for value chain aggregator.

TSMC OIP®

We’ve now reached the end of the semiconductor history lesson, for now. Getting to this point has been quite challenging and exciting. Suk Lee did a great job explaining the history. TSMC made it happen and we’re better as a result. I look forward the next phase of semiconductor growth and where it may take us. For now, remember that the fourth era of semiconductor is all about collaboration.


TSMC Design Considerations for Gate-All-Around (GAA) Technology

TSMC Design Considerations for Gate-All-Around (GAA) Technology
by Tom Dillinger on 07-12-2021 at 6:00 am

mobility differences 3

The annual VLSI Symposium provides unique insights into R&D innovations in both circuits and technology.  Indeed, the papers presented are divided into two main tracks – Circuits and Technology.  In addition, the symposium offers workshops, forums, and short courses, providing a breadth of additional information.

At this year’s symposium, a compelling short course was:  “Advanced Process and Device Technology Toward 2nm-CMOS and Emerging Memory”.  A previous SemiWiki article from Scotten Jones gave an excellent summary of the highlights of (part of) this extensive short course. (link)

Due to space limitations, Scotten wasn’t able to delve too deeply into the upcoming introduction of Gate-All-Around (GAA) technology.  This article provides a bit more info, focusing on material presented in the short course by Jin Cai from TSMC’s R&D group, entitled:  “CMOS Device Technology for the Next Decade”.

FinFET to GAA Transition

Successive generations of FinFET process technology development have resulted in tighter fin pitch and taller fins, with increasingly vertical fin sidewall profile.  Significant improvements in drive current per unit area have been realized.  The electrostatic control of the gate input over the three surfaces of the vertical fin has also improved subthreshold leakage currents.

Yet, Jin highlighted that, “Free carrier mobility in the vertical fin is adversely impacted for very small fin thickness.  TSMC has introduced SiGe (for pFETs) at the N5 node, to improve mobility.  Strain engineering continues to be a crucial aspect of FinFET fabrication, as well.”  (nFET:  tensile strain; pFET:  compressive strain)

The figure below illustrates the trends in short-channel effect and carrier mobility versus fin width.

Jin continued, “An optimal process target is ~40-50nm fin height, ~6nm fin thickness, and ~15nm gate length, or 2.5X the fin thickness.”

The next step in device scaling is the horizontal gate-all-around, or “nanosheet” (NS) configuration.  A superlattice of alternating Si and SiGe layers are fabricated on the wafer substrate.  A unique set of etch/dep steps are used to remove the SiGe material at the NS layer edges and deposit a spacer oxide in the recessed area, leaving the Si layer sidewalls exposed.   Source/drain epitaxy is then grown out from the Si sidewalls, providing both the S/D doping and structural support for the Si nanosheets.  The SiGe layers in the nanosheet stack are then selectively removed, exposing the Si channels.  Subsequent atomic layer deposition (ALD) steps introduce the gate oxide stack, potentially with multiple workfunctions for device Vt offerings.  Another ALD step provides the gate material, fully encapsulating the nanosheet stack.

Jin focused on the carrier mobility characteristics of the nanosheet-based GAA device, as representative of performance.  (More on GAA parasitic capacitance and resistance shortly.)  The figure below provides an illustration of the crystalline orientation for GAA devices, to optimize the lateral mobility in the horizontal nanosheet layer channels.

Jin highlighted a key issue facing the development of NS process technology – the (unoptimized) hole mobility is significantly less than the nFET electron mobility, as illustrated below.

Digression:  Carrier Mobility and Circuit Beta Ratio

When CMOS technology was first introduced, there was a considerable disparity in nFET electron and pFET hole mobility in strong inversion.  A general circuit design target is to provide “balanced” RDLY and FDLY delay (and signal transition) values, especially critical for any circuit in a clock distribution network.  As a result, logic circuits adopted device sizing guidelines, where Wp/Wn was inversely proportional to the carrier mobility ratio – i.e., Wp/Wn ~ mu_electron/mu_hole.  For example, a device sizing “beta ratio” of ~2.5 was commonly used.

(Wp and Wn are “effective” design values – for logic circuit branches with multiple series devices, to maintain the same effective drive strength, wider devices are required.)

With process technology scaling employing thinner channels below the oxide surface, and with extensive channel strain engineering, the ratio between electron and hole mobility was reduced, approaching unity.  Indeed, as illustrated below, the introduction of FinFET devices with quantized width values depended upon the reduction in carrier mobility difference.  (Imagine trying to design logic circuits with a non-integral beta ratio in the 2+2 fin standard cell image shown below.)

Nanosheet Circuit Design

The figure above depicts a standard cell library image, for both current FinFET and upcoming nanosheet technologies.  Unlike the quantized width of each fin (Wfin ~ 2*Hfin + Tfin), the nanosheet device width is a continuous design parameter, and (fortuitously) can more readily accommodate a unique beta ratio.

Note that there will be limits on the maximum nanosheet device width.  The process steps for selectively removing the interleaved SiGe superlattice layers and the deposition of the oxide and gate materials need to result in highly uniform surfaces and dimensions, which will be more difficult for wider nanosheet stacks.

Speaking of nanosheet stacks, it should also be noted that the layout device width is multiplied by the number of nanosheet layers.  Jin presented the results of an insightful analysis evaluating a potential range of layers, as shown below.

A larger number of layers increases the drive current, but the (distributed) contact resistance through the S/D regions to the lower layers mitigates this gain.  The majority of the published research on nanosheet technology has focused on ~3-4 layers, for optimal efficiency.

Parenthetically, there has also been published research investigating nanosheet fabrication process techniques that would locally remove one (or more) nanosheet layers for a specific set of devices, before ALD of the surrounding oxide and gate.  In other words, some devices could incorporate less than 3 layers.  Consider the circuit applications where a weak device strength is optimum, such as a leakage node “keeper” or a pullup device in a 6-transistor SRAM bitcell.  Yet, the resulting uneven surface topography adds to the process complexity – the upcoming introduction of GAA technology may not offer a variable number of nanosheet layers.  The same surface topography issue would apply toward a GAA process that would attempt to build nFETs from superlattice Si layers and pFETs from superlattice SiGe layers, assuming the ability to selectively etch Si from SiGe for pFETs.

The net for designers is that GAA technology will offer (some) variability in device sizing, compared to the quantized nature of FinFETs.  Leakage currents will be reduced, due to the GAA electrostatics surrounding the nanosheet channel (more on that shortly).

Analog circuits may be more readily optimized, rather than strictly relying upon a ratio of the number of fins.  SRAM cell designs are no longer limited to the PD:PU:PG = 2:1:1 or 1:1:1 FinFET sizing restrictions.

Currently, FinFET standard cell libraries offer cells in integral 1X, 2X, 4X drive strength options, often with 3 or 4 device Vt variants.  With greater sizing freedom (and potentially fewer device Vt alternatives) in a GAA technology, library designers have a different set of variables from which to select.  It will be interesting to see how cell library designers utilize this device flexibility.

Ongoing Nanosheet Fabrication R&D

Jin described three areas of active process R&D for more optimum nanosheet characteristics.

  • increased SiGe stoichiometry for pFETs

The lower hole mobility in nanosheet Si layers is a concern.  Research is ongoing to increase the SiGe composition in pFET nanosheet layers (without adopting a SiGe superlattice stack, due to the topography difficulties mentioned above).  One approach would be to “trim” the pFET Si nanosheet thickness after superlattice etch, and deposit a SiGe “cladding” layer, prior to oxide and gate deposition.  The difficulty would be maintaining a uniform nanosheet thickness after the trim and SiGe cladding deposition steps.

  • optimization of parasitic Cgs/Cgd capacitances

FinFETs have a (relatively) high parasitic capacitance between gate and source/drain nodes, due in part to the gate vertical sidewall-to-S/D node capacitance contribution between fins.  The horizontal nanosheet utilizes a different gate-to-S/D oxide orientation, arising from the inner spacer deposited in the SiGe superlattice layers prior to S/D epitaxy and SiGe etch.  Jin highlighted that the nanosheet and recessed oxide dimensions need to be optimized not only for the drive current, but also the parasitic Cgs/Cgd capacitances, as illustrated below.

  • bottom nanosheet “mesa” leakage

The GAA topology improves upon the (3-sided) FinFET electrostatics, reducing subthreshold device leakage current.  However, there is a parasitic leakage path for the very bottom (or “mesa”) nanosheet layer.  After the superlattice etching, oxide dep, and gate dep steps, the gate-to-substrate electrostatics offers a (non-GAA) channel current path.

As illustrated above, Jin highlighted R&D efforts to reduce this leakage current contribution, through either:

  • additional impurity introduction below the nanosheet stack
  • partial dielectric isolation between the substrate and S/D nodes
  • full dielectric isolation between the substrate, S/D nodes, and bottom layer nanosheet gate

Summary

Jin’s presentation offered great insights into the relative characteristics of FinFET and GAA devices, as process nodes evolve to the horizontal nanosheet topology.  Designers will benefit from reduced leakage currents and design sizing flexibility, although disparities between nanosheet channel electron and hole mobility will require renewed consideration of circuit beta ratios.  Ongoing process R&D efforts are seeking to reduced this carrier mobility difference, and optimize parasitic Rs, Rd, Cgs, and Cgd elements.

Jin presented a rough timeline shown below, for the introduction of GAA nanosheet technology, before new device configurations (e.g., 3D silicon fabrication) and non-silicon materials (e.g., 2D semiconductors) will emerge.

As Scotten also suggested in his article, if you have the opportunity, I would encourage you to register and view this enlightening VLSI Symposium short course.

-chipguy


Apple’s Orphan Silicon

Apple’s Orphan Silicon
by Paul Boldt on 07-11-2021 at 6:00 am

T2 die anno lr

Apple’s recent Spring Loaded Event brought us M1-based iMacs.  After the MacBook Air and 13” MacBook Pro in the fall, iMacs are the third Mac to jettison Intel processors.  With this transition Apple’s T2 chip enters End of Life status, so to speak.  The T2 is a bit of an enigma and now it does not have much time left.

We know it performs a wide range of tasks in Macs, including security, encryption, video processing, storage control and housekeeping.  This 2019 AppleInsider article tested encode times for Macs having the same processor, where one had a T2, and one did not.  The Mac with the T2 executed the encode in 1/2 the time.

Despite all this functionality we know surprising little about the T2.  There simply is not much information floating around.  Wikipedia does not even report a die size or process node.  Did Apple design a whole new chip? How much is borrowed from the A-series family?  How much is new design?  How much is Apple investing to achieve the desired functionality for Macs?  It is time to look at a T2 and find out what Apple created for their Intel-based Mac co-processor.

Package & hints of memory

The T2 under study came from a 2019 13” MacBook Air logic board.  The T2’s package has a decent footprint compared to the other ICs around it.  For comparison, the larger of the shiny dies to the right of the T2, between four mounts, is the Intel i5.  One can envision the T2 being a similar size, based on the package.  There is a “1847” date code on the package indicating late 2018 assembly.

2019 13” MacBook Air logic board

A teardown of the late 2018 MacBook Air simply listed the T2’s part number.  However, a teardown of a 2019 15” MacBook Pro indicated the T2 was “layered on a 1 GB Micron D9VLN LPDDR4 memory”.  Our package also included the “D9VLN” marking.  A decoder at Micron points to a 1GB LPDDR4.  The T2 and memory would likely be in a Package-On-Package (PoP) arrangement.

A second die was in fact found in the beaker after de-packaging.  The markings visible at top metal are Micron’s.  The inclusion of in-package DRAM is interesting, not to mention costly, for a companion IC or co-processor.  It is however not too surprising considering the T2 is derived from the A-series that has long had in-package DRAM.

Top metal die markings of second die in T2 package.

Die photo & “PUs”

It is time for the main event.  SEM analysis of several line pitches and 6T SRAM cell size confirmed the T2 is fabbed in a TSMC16 nm process.  This is the same node as the A10, so the latter will serve as a reference A-series processor against which the T2 can be compared.

T2 die photo with CPU and GPU annotations.

A10 die photo with original annotations.  Source: Chipworks

Visually, the CPU is the  first thing that jumps out at you.  It is the same design and layout as the A10.  Assuming the T2 was designed after the A10, the CPU was dropped in as a hard macro.  Remember it is a 4-core CPU!  There are two performance i.e. large Hurricane cores and two efficiency i.e. small Zephyr cores.  That is quite a bit power considering there is already an Intel i5 for the main system processor.

One can only imagine the conversation within Apple.  “Do you have any CPU’s ready to go?”  “Yup … there is a 4-core 17.4 mm2 design that is only a few months old on the shelf over there.” “Great, I’ll take one of those.” Well, maybe it was a bit more technical.

The GPU does not follow suit.  The A10’s 6-core GPU is organized as 3 blocks for the cores and a block of logic.  The T2’s GPU appears to be along the lower edge of the die.  Again, the cores are organized as 3 blocks.  We did not discern symmetry within these blocks, suggesting 3 cores.  The GPU logic is likely in a block just above the cores, where the hashed lines encircle a potential area for it.  Even if all 3 blocks within this area were GPU logic, it would be smaller than that identified for the A10.  There is more analysis needed here to confirm the GPU configuration, but there are suggestions that both the GPU’s logic and cores are smaller than those on the A10.

Additional block-level analysis is ongoing.  We see blocks that were used as-is, when compared to the A10, ones that received a new layout, and straight-up new design.

Early numbers

The T2 measures 9.6 mm by 10.8 mm, yielding a die size of 104 mm2.  It is not a small die!  The T2 is a serious processor.  This is roughly 80% of the A10’s 125 mm2.

As expected, the CPU has an area of approximately 17.4 mm2 on both dies.  This yields a higher % of the total die dedicated to the CPU in the T2.  The T2’s GPU is considerably smaller than the A10’s.  Each core comes in at 1.2 mm2 v. 5.3 mm2 for the cores of the A10.  Functionally, this makes sense as the T2 GPU should not be tasked near as much as the A10’s because it is not the primary GPU.  Again, there is already either Intel embedded graphics or a dedicated GPU on the logic board.

Pulling threads together

There is plenty more to extract from the reverse engineering, but this snippet provides a flavor of Apple’s thinking.  As a starting point, Apple looks to user functionality.  An ongoing question at Apple seems to be “What do we want the user to experience from an Apple product?”  Then they build it.  The T2 became Apple’s interpretation of this for Intel-based Macs, but remember prior to the T1 the Intel processors were flying solo, and Macs still worked.

The T2 leveraged design from the A-series processors as shown in the CPU.  It’s 4-core CPU is large, to say the least, and it is hard not to think it is overpowered for the T2.  That said, Apple would look at the cost of re-design v. the silicon cost associated with dropping in something that might be larger than is truly needed.  The latter was probably more enticing as the wafer starts for the T2 would be nowhere near those of the A10, or any A-series processor for that matter.  Besides, the extra horsepower will provide a better experience.

The T2 also consolidated stand-alone ICs within a Mac.  The storage or SSD controller is one example of this.  Apple bought Israeli-based Anobit in 2011.  The 2016 13” MacBook Pro (with Function Keys) included an Apple stand-alone storage controller (see slide 11).  The controller became a block on the T2.  Today, it would be a block on the M1.

Conclusion

We will continue to dig into the T2, focusing on the known block functionalities, it’s comparison with the A10 and looking forward.  Yes, the T2 and the A10 are both old designs, but the comparison liberates information about use of semiconductor design and the effort Apple invests to provide their desired user experience.

*This article is jointly authored by Lev Klibanov. Dr. Klibanov is an independent consultant in semiconductor process and related fields. Dr. Klibanov has focused on and has considerable experience in advanced CMOS logic, non-volatile memory, CMOS image sensors, advanced packaging, and MEMS technologies.  He has spent 20+ years working in reverse engineering, metrology, and fabrication.


VLSI Symposium – TSMC and Imec on Advanced Process and Devices Technology Toward 2nm

VLSI Symposium – TSMC and Imec on Advanced Process and Devices Technology Toward 2nm
by Scotten Jones on 07-02-2021 at 6:00 am

Figure 1

At the 2021 Symposium on VLSI Technology and Circuits in June a short course was held on “Advanced Process and Devices Technology Toward 2nm-CMOS and Emerging Memory”. In this article I will review the first two presentations covering leading edge logic devices. The two presentations are complementary and provide and excellent overview of the likely evolution of logic technology.

CMOS Device Technology for the Next Decade, Jin Cai, TSMC

Gate length (Lg) scaling of planar MOSFETs is limited to approximately 25nm because the single surface gate has poor control of sub surface leakage.

Adding more gates such as in a FinFET where the channel is constrained between three gates yields the ability to scale Lg to approximately 2.5 times the thickness of the channel. FinFETs have evolved from Intel’s initial 22nm process with highly sloped fin walls to todays more vertical walls and TSMC’s high mobility channel FinFET implemented for their 5nm process.

Taller fins increase the effective channel width (Weff), Weff = 2Fh + Fth, where Fh is the fin height and Fth is the fin thickness. Increasing Weff increases drive current for heavily loaded circuits but excessively tall fins waste active power. Straight and thin fins are good for short channel effects but Fw is limited by reduced mobility and increase threshold voltage variability. Implementing a high mobility channel (authors note, SiGe for the pFET fin) in their 5nm technology gave TSMC an ~18% improvement in drive current.

As devices scale down parasitic resistance and capacitance become a problem. Contacted Poly Pitch (CPP) determine standard cell widths (see figure 1) and is made up of Lg, Contact Width (Wc) and Spacer Thickness (Tsp), CPP = Lg + Wc + 2Tsp. Reducing Wc increases parasitic resistance unless process improvements are made to improve the contacts and reducing tsp increases parasitic capacitance unless slower dielectric constant spacers are used.

Figure 1. Standard Call Size.

 As the height of a standard cell is reduced the number of fins per device has to be reduced (fin depopulation), see figure 2.

Figure 2. Fin Depopulation.

Fin depopulation reduced cell size increasing logic density and provide higher speed and lower power, but it does reduce drive current.

Transitioning from FinFETs to stacked Horizontal Nanosheets (HNS) enable increased flexibility by varying the sheet width (see figure 3.) and the ability to increase Weff by stacking more sheets.

 Figure 3. Flexible Sheet Width.

Adding sheets adds to Weff, Wee = N*2(W+H), where N is the number of sheets, W is the sheet width and H is the sheet height (thickness). Ultimately the number of sheets is limited by the performance of the bottom sheet. The spacing between sheets reduces parasitic resistance and capacitance as it is reduced but must be big enough to get the gate metals and dielectric into the gap. There is a bottom parasitic mesa device under and HNS stack that can be control by implants or a dielectric layer.

In FinFET nFET electron mobility is higher than pFET hole mobility. In HNS the mobility is even more unbalanced with higher electron mobility and lower hole mobility. Hole mobility can be improved by cladding the channel with SiGe or using a Strain Relaxed Buffer but both techniques add process complexity.

Imec has introduced a concept called a Forksheet (FS) where a dielectric layer is put between the nFET and pFET reducing the n-p spacing resulting in a more compact standard cell, see figure 4.

Figure 4. Forksheet.

 Beyond a HNS with FS, there is the Complementary FET (CFET) that stacks the nFET and pFET eliminating the need for horizontal n-p spacing.

Figure 5. CFET.

CFET options include monolithic integration where both nFET and pFET devices are fabricated on the same wafer and sequential integration where the nFET and pFET are fabricated on separate wafers that are then bonded together, both options have multiple challenges that are still being worked on.

Beyond CFET the presenter touched on 3D integration with transistor integrated into the Back End Of Line (BEOL) interconnect. These options require low temperature transistors with polysilicon channels or oxide semiconductors presenting a variety of performance and integration challenges.

In the Front End Of Line (FEOL) options beyond CFETs are being explored such as high mobility materials, Tunnel FETs (TFET), Negative Capacitance FETs (NCFET), Cryogenic CMOS and low dimensional materials.

Low dimensional materials make take the form of nanotubes or 2D Materials, these materials offer even shorter Lg and lower power than HNS but are still in the early research phase. Low dimensional materials also fit into the HNS/CFET approach with the option to stack up many layers.

Nanosheet Device Architecture to Enable CMOS Scaling in 3nm and beyond: Nanosheet, Forksheet and CFET, Naoto Horiguchi, Imec.

This section of the course expanded on the HNS/FS/CFET options discussed in the previous section.

As FinFETs are being scaled to the limits, fins are getting taller, thinner and closer. Fin depopulation is reducing drive current and increasing variability, see figure 6.

Figure 6. FinFET scaling.

The state-of-the-art today is a 6-track cell with 2 fins per device. Moving to single fins and narrower n-p spacing will require new device architectures to drive performance, see figure 7.

Figure 7. 6-Track Cell

To continue CMOS scaling we need to transition from FinFET sot HNS to HNS with FS and then CFETs, see figure 8.

Figure 8. Nanosheet Architectures for CMOS Scaling.

Transitioning from FinFETs to HNS offer several advantages, great Weff, improved short channel effect which means shorter Lg and better design flexibility due to the ability to vary the sheet width, see figure 9.

Figure 9. FinFET to HNS.

The presenter went on to go into detail on HNS processing and some of the challenges and possible solutions. A HNS process is very similar to FinFET processing except for four main modules, see figure 10.

Figure 10. HNS Process Flow.

Although a HNS flow is similar to a FinFET flow the key modules that are different are difficult. The release etch and achieving multiple threshold voltages is particularly difficult. There was a lot of good information on the specifics of the process modules changes required for HNS that is beyond the scope of a review article like this. One thing that wasn’t explicitly discussed is that in order to scale a HNS process to a 5-track cell Buried Power Rails (BPR) are required and that is another difficult process module that is still being developed.

As seen in the previous presentation further scaling of HNS can be achieved by FS. Figure 11 presents a more detailed view of how a dielectric wall shrinks a HNS cell.

Figure 11. Horizontal Nanosheet/Forksheet Structure Comparison.

The FS process requires the insertion of a dielectric wall to decrease the n-p spacing, figure 12 illustrates the process flow.

Figure 12. Forksheet Process Flow.

Beyond FS, CFET offers zero horizontal n-p spacing by stacking devices. Figure 13. Illustrates the CFET concept.

Figure 13. CFET Concept.

CFETs are particularly interesting for SRAM scaling. SRAM scaling has slowed and is not keeping up with logic scaling. CFET offer the potential to return SRAM scaling to the historical trend, see figure 14.

Figure 14. SRAM Scaling with CFET.

As previously mentioned there are two approaches to CFET fabrication, monolithic and sequential. Figure 15 contrasts the two approaches with pluses and minuses for each.

Figure 15. CFET Fabrication Options.

Conclusion

This review presented some of the key points of the two presentations leading edge logic devices. This is just an overview of the excellent and more detailed information presented in the course. The course also covered interconnect, contacts, and metrology for logic, and emerging memory, 3D memory and DRAM. I highly recommend the short courses.

Also Read:

Is IBM’s 2nm Announcement Actually a 2nm Node?

Ireland – A Model for the US on Technology

How to Spend $100 Billion Dollars in Three Years