A Compelling Application for AI in Semiconductor Manufacturing

A Compelling Application for AI in Semiconductor Manufacturing
by Tom Dillinger on 07-06-2020 at 6:00 am

AI opportunities

There have been a multitude of announcements recently relative to the incorporation of machine learning (ML) methods into EDA tool algorithms, mostly in the physical implementation flows.  For example, deterministic ML-based decision algorithms applied to cell placement and signal interconnect routing promise to expedite and optimize physical design results, without the iterative cell-swap placement and rip-up-and-reroute algorithms.  These quality-of-results and runtime improvements are noteworthy, to be sure.

Yet, there is one facet of the semiconductor industry that is (or soon will be) critically-dependent upon AI support – the metrology of semiconductor process characterization, both during initial process development/bring-up, and in-line inspection driving continuous process improvement.  (Webster’s defines metrology as “the application of measuring instruments and testing procedures to provide accurate and reliable measurements”.)  Every aspect of semiconductor processing, from lithographic design rule specifications to ongoing yield analysis, is fundamentally dependent upon accurate and reliable data for critical dimension (CD) lithographic patterning and material composition.

At the recent VLSI 2020 Symposium, Yi-hung Lin, Manager of the Advanced Metrology Engineering Group at TSMC, gave a compelling presentation on the current status of semiconductor metrology techniques, and the opportunities for AI methods to provide the necessary breakthroughs to support future process node development.  This article briefly summarizes the highlights of his talk. [1]

The figure below introduced Yi-hung’s talk, illustrating the sequence where metrology techniques are used.  There is an initial analysis of fabrication materials specifications and lithography targets during development.  Once the process transitions to manufacturing, in-line (non-destructive) inspection is implemented to ensure that variations are within the process window for high yield.  Over time, the breadth of different designs, and specifically, the introduction of the process on multiple fab lines requires focus on dimensional matching, wafer-to-wafer, lot-to-lot, and fab line-to-fab line.

The “pre-learning” opportunities suggest that initial process bring-up metrology data could be used as the training set for AI model development, subsequently applied in production.  Ideally, the models would be used to accelerate the time to reach high-volume manufacturing.  These AI opportunities are described in more detail below.

Optical Critical Dimension (OCD) Spectroscopy
I know some members of the SemiWiki audience fondly (or, perhaps not so fondly) recall the many hours spent in the clean room looking through a Zeiss microscope at wafers, to evaluate developed photoresist layers, layer-to-layer alignment verniers, and material etch results.  At the wavelength of the microscope light source, these multiple-micrometer features were visually distinguishable – those days are long, long gone.

Yi-hung highlighted that OCD spectroscopy is still a key source of process metrology data.  It is fast, inexpensive, and non-destructive – yet, the utilization of OCD has changed in deep sub-micron nodes.  The figure below illustrates the application of optical light sources in surface metrology.

The incident (visible, or increasingly, X-ray) wavelength is provided to a 3D simulation model of the surface, which solves electromagnetic equations to predict the scattering.  These predicted results are compared to the measured spectrum, and the model is adjusted – a metrology “solution” is achieved when the measured and EM simulation results converge.

OCD illumination is most applicable when an appropriate (1D or 2D) “optical grating-like” pattern is used for reflective diffraction of the incident light.  However, the challenge is that current surface topographies are definitely three-dimensional, and the material measures of interest do not resemble a planar grating.  Optical X-ray scatterometry provides improved analysis accuracy with these 3D topographies, but is an extremely slow method of data gathering.

Yi-hung used the term ML-OCD, to describe how an AI model derived from other metrology techniques could provide an effective alternative to the converged EM simulation approach.  As illustrated below, the ML-OCD spectral data would serve as the input training dataset for model development, with the output target being the measures from (destructive) transmission electron microscopy (TEM), to be discussed next.

ML for Transmission Electron Microscopy (TEM)
TEM utilizes a focused electron beam that is directed through a very thin sample – e.g., 100nm or thinner.  The resulting (black-and-white) image provides high-magnification detail of the material cross-section, due to the much smaller electron wavelength (1000X smaller than an optical photon).

There are two areas that Yu-hing highlighted where ML techniques would be ideal for TEM images.  The first would utilize familiar image processing and classification techniques to automatically extract CD features, especially useful for “blurred” TEM images.  The second would be to serve as the training set output for ML-OCD, as mentioned above.  Yi-hung noted that one issue to the use of TEM data for ML-OCD modeling is that a large amount of TEM sample data would required as the model output target.  (The fine resolution of the TEM image compared to the field of the incident OCD exposure exacerbates the issue.)

ML for Scanning Electron Microscopy (SEM)
The familiar SEM images measure the intensity of secondary electrons (emitted from the outer atomic electron shell) that are produced from collisions with an incident primary electron – the greater the number of SE’s generated in a local area, the brighter the SEM image.  SEMs are utilized at deep submicron nodes for (top view) line/space images, and in particular, showing areas where lithographic and material pattering process defects are present.

ML methods could be applied to SEM images for defect identification and classification, and to assist with root cause determination by correlating the defects to specific process steps.

Another scanning electron technique uses a variable range of higher-energy primary electrons, which will have different landing distances from the surface, and thus, provide secondary electrons from deeper into the material.  However, an extremely large primary energy will result in the generation of both secondary electrons and X-ray photons, as illustrated below.  (Yi-hung noted that this will limit the image usability for the electron detectors used in SEM equipment, and thus limit the material depth that could be explored – either more SE sensitivity or SE plus X-ray detector resolution will be required.)   The opportunities for a (generative) machine learning network to assist with “deep SEM” image classification are great.

Summary
Yi-hung concluded his presentation with the following breakdown of metrology requirements:

  • (high-throughput) dimensional measurement:
      • OCD, X-ray spectroscopy  (poor on 3D topography)
  • (high-accuracy, destructive) reference measurement:  TEM
  • Inspection (defect identification and yield prediction):  SEM
  • In-line monitoring (high-throughput, non-destructive):
      • hybrid of OCD + X-ray, with ML-OCD in the future?

In all these cases, there are great opportunities to apply machine learning methods to the fundamental metrology requirements of advanced process development and high-volume manufacturing.   Yi-hung repeated the cautionary tone that semiconductor engineering metrology currently does not have the volume of training data associated with other ML applications.  Nevertheless, he encouraged data science engineers potentially interested in these applications to contact him.   🙂

Yu-hing also added that there is a whole other metrology field to explore for potential AI applications – namely, application of the sensor data captured by individual pieces of semiconductor processing equipment, as it relates to overall manufacturing yield and throughput.  A mighty challenge, indeed.

-chipguy

 

References

[1]  Yi-hung Lin, “Metrology with Angstrom Accuracy Required by Logic IC Manufacturing – Challenges From R&D to High Volume Manufacturing and Solutions in the AI Era”, VLSI 2020 Symposium, Workshop WS2.3.

Images supplied by the VLSI Symposium on Technology & Circuits 2020.

 


Optimizing Chiplet-to-Chiplet Communications

Optimizing Chiplet-to-Chiplet Communications
by Tom Dillinger on 06-29-2020 at 6:00 am

bump dimensions

Summary
The growing significance of ultra-short reach (USR) interfaces on 2.5D packaging technology has led to a variety of electrical definitions and circuit implementations.  TSMC recently presented the approach adopted by their IP development team, for a parallel-bus, clock-forwarded USR interface to optimize power/performance/area – i.e., “LIPINCON”.

Introduction
The recent advances in heterogeneous, multi-die 2.5D packaging technology have resulted in a new class of interfaces – i.e., ultra-short reach (USR) – whose electrical characteristics differ greatly from traditional printed circuit board traces.  Whereas the serial communications lane of SerDes IP is required for long, lossy connections, the short-reach interfaces support a parallel bus architecture.

The SerDes signal requires (50 ohm) termination to minimize reflections and reduce far-end crosstalk, adding to the power dissipation.  The electrically-short interfaces within the 2.5D package do not require termination.  Rather than “recovering” the clock embedded within the serial data stream, with the associated clock-data recovery (CDR) circuit area and power, these parallel interfaces can use a simpler “clock-forwarded” circuit design – a transmitted clock signal is provided with a group of N data signals.

Another advantage of this interface is that the circuit design requirements for electrostatic discharge protection (ESD) between die are much reduced.  Internal package connections will have lower ESD voltage stress constraints, saving considerable I/O circuit area (and significantly reducing I/O parasitics).

The unique interface design requirements between die in a 2.5D package has led to the use of the term “chiplet”, as the full-chip design overhead of SerDes links is not required.  Yet, to date, there have been quite varied circuit and physical implementation approaches used for these USR interfaces.

TSMC’s LIPINCON interface definition
At an invited talk for the recent VLSI 2020 Symposium, TSMC presented their proposal for a parallel-bus, clock-forwarded architecture – “LIPINCON” – which is short for “low-voltage, in-package interconnect”. [1]  This article briefly reviews the highlights of that presentation.

The key parameters of the short-reach interface design are:

  • Data rate per pin:  dependent upon trace length/insertion loss, power dissipation, required circuit timing margins
  • Bus width:  with modularity to define sub-channels
  • Energy efficiency:  measured in pJ/bit, including not only the I/O driver/receiver circuits, but any additional data pre-fetch/queuing and/or encoding/decoding logic
  • “Beachfront” (linear) and area efficiencies:  measure of the aggregate data bandwidth per linear edge and area perimeter on the chiplets – i.e., Tbps/mm and Tbps/mm**2;  dependent upon the signal bump pitch, and the number and pitch of the metal redistribution layers on the 2.5D substrate, which defines the number of bump rows for which signal traces can be routed – see the figures below
  • Latency:  another performance metric; the time between the initiation of data transmit and receive, measured in “unit intervals” of the transmit cycle

Architects are seeking to maximize the aggregate data bandwidth (bus width * data rate), while achieving very low dissipated energy per bit.  These key design measures apply whether the chiplet interface is between multiple processors (or SoCs), processor-to-memory, or processor-to-I/O controller functionality.

The physical signal implementation will differ, depending on the packaging technology.  The signal redistribution layers (RDL) for a 2.5D package with silicon interposer will leverage the finer metal pitch available (e.g., TSMC’s CoWoS).  For a multi-die package utilizing the reconstituted wafer substrate to embed the die, the RDL layers are much thicker, with a wider pitch (e.g., TSMC’s InFO).  The figures below illustrate the typical signal trace shielding (and lack of shielding) associated with CoWoS and InFO designs, and the corresponding signal insertion and far-end crosstalk loss.

 

The key characteristics of the TSMC LIPINCON IP definition are illustrated schematically in the figure below.

  • A low signal swing interface of 0.3V is adopted (also saves power).
  • The data receiver uses a simple differential circuit, with a reference input to set the switching threshold (e.g., 150mV).
  • A clock/strobe signal is forwarded with (a sub-channel of) data signals;  the receiver utilizes a simple delay-locked loop (DLL) to “lock” to this clock.

Briefly, a DLL is a unique circuit – it consists of an (even-numbered) chain of identical delay cells.  The figure below illustrates an example of the delay chain. [2]   The switching delay of each stage is dynamically adjusted by modulating the voltage inputs to the series nFET and pFET devices in the input inverter of each stage – i.e., a “current-starved” inverter.  (Other delay chain implementations dynamically modify the identical capacitive load at each stage output, rather than adjusting the internal transistor drive strength of each stage.)

The “loop” in the DLL is formed by a phase detector (XOR-type logic with low-pass filter), which compares the input clock to the final output of the chain.  The leading or lagging nature of the input clock relative to the chain output adjusts the inverter control voltages – thus, the overall delay of the chain is “locked” to the input clock.  The (equal) delays of each stage in the DLL chain provides outputs that correspond to a specific phase of the input clock signal.  The parallel data is captured in receiver flops using an appropriate phase output, a means of compensating for any data-to-clock skew across the interface.

The TSMC IP team developed an innovative approach for the specific case of a SoC-to-memory interface.  The memory chiplet may not necessarily embed a DLL to capture signal inputs.  For a very wide interface – e.g., 512 addresses, 256 data bits, divided into sub-channels – the overhead of the DLL circuitry in the cost-sensitive memory chiplet would be high.  As illustrated in the figure below, the DLL phase output which serves as the input strobe for a memory write cycle is present in the SoC instead.  (The memory read path is also illustrated in the figure, illustrating how the data strobe from the memory is connected to the read_DLL circuit input.)

For the parallel LIPINCON interface, simultaneous switch noise (SSN) related to signal crosstalk is a concern.  For the shielded (CoWoS) and unshielded (InFO) RDL signal connections illustrated above, TSMC presented results illustrating very manageable crosstalk for this low-swing signaling.

To be sure, designers would have the option of developing a logical interface between chiplets that used data encoding to minimize signal transition activity in successive cycles.  The simplest method would be to add data bus inversion (DBI) coding – the data in the next cycle could be compared to the current data, and transmitted using true or inverted values to minimize the switching activity.  An additional DBI signal between chiplets carries this decision for the receiver to decode the values.

The development of heterogeneous 2.5D packaging relies upon the integration of known good die/chiplets (KGD).  Nevertheless, the post-assembly yield of the final package can be enhanced by the addition of redundant lanes which can be selected after package test (ideally, built-in self-test).  The TSMC presentation included examples of redundant lane topologies which could be incorporated into the chiplet designs.  The figure below illustrates a couple of architectures for inserting redundant through-silicon-vias (TSVs) into the interconnections.  This would be a package yield versus circuit overhead tradeoff when architecting the interface between chiplets.

In a SerDes-based design, thorough circuit and PCB interconnect extraction plus simulation is used to analyze the signal losses.  The variations in signal jitter and magnitude are analyzed against the receiver sense amp voltage differential.  Hardware lab-based probing is also undertaken to ensure a suitable “eye opening” for data capture at the receiver.  TSMC highlighted that this type of interface validation is not feasible with the 2.5D package technology.  As illustrated below, a novel method was developed by their IP team to introduce variation into the LIPINCON transmit driver and receive capture circuitry to create an equivalent eye diagram for hardware validation.

The TSMC presentation mentioned that some of their customers have developed their own IP implementations for USR interface design.  One example showed a very low swing (0.2V) electrical definition that is “ground referenced” (e.g., signal swings above and below ground).  Yet, for fabless customers seeking to leverage advanced packaging, without the design resources to “roll their own” chiplet interface circuitry, the TSMC LIPINCON IP definition is an extremely attractive alternative.  And, frankly, given the momentum that TSMC is able to provide, this definition will likely help accelerate a “standard” electrical definition among developers seeking to capture IP and chiplet design market opportunities.

For more information on TSMC’s LIPINCON definition, please follow this link.

-chipguy

 

References

[1]  Hsieh, Kenny C.H., “Chiplet-to-Chiplet Communication Circuits for 2.5D/3D Integration Technologies”,  VLSI 2020 Symposium, Paper SC2.6 (invited short course).

[2]  Jovanovic, G., et al., “Delay Locked Loop with Linear Delay Element”, International Conference on Telecommunication, 2005, https://ieeexplore.ieee.org/document/1572136

Images supplied by the VLSI Symposium on Technology & Circuits 2020.

 


Multi-Vt Device Offerings for Advanced Process Nodes

Multi-Vt Device Offerings for Advanced Process Nodes
by Tom Dillinger on 06-26-2020 at 6:00 am

Ion Ioff

Summary
As a result of extensive focus on the development of workfunction metal (WFM) deposition, lithography, and removal, both FinFET and gate-all-around (GAA) devices will offer a wide range of Vt levels for advanced process nodes below 7nm.

Introduction
Cell library and IP designers rely on the availability of nFET and pFET devices with a range of threshold voltages (Vt).  Optimization algorithms used in physical synthesis flows evaluate the power, performance, and area (PPA) of both cell “drive strength” (e.g., 1X, 2X, 4X-sized devices) and cell “Vt levels” (e.g., HVT, SVT, LVT) when selecting a specific instance to address timing, noise, and power constraints.  For example, a typical power optimization decision is to replace a cell instance with a higher Vt variant to reduce leakage power, if the timing path analysis margins allow (after detailed physical implementation).  The additional design constraints for multi-Vt cell library use are easily managed:  (1) the device Vt active area must meet (minimum) lithography area requirements, and (2) the percentage of low Vt cells used should be small, to keep leakage currents in check.

A common representation to illustrate the device Vt offerings in a particular process is to provide an I_on versus I_off characterization curve, as shown in the figure below.

Although it doesn’t reflect the process interconnect scaling options, this curve is also commonly used as a means of comparing different processes, as depicted in the figure.  A horizontal line shows the unloaded, I_on based performance gains achievable.  The vertical line illustrates the iso-performance leakage I_off power reduction between processes, for a reference-sized device in each.  Note that these lines are typically drawn without aligning to specific (nominal) Vt devices in the two process nodes.

The I_on versus I_off curve does not really represent the statistical variation in the process device Vt values.  A common model for representing this data is the Pelgrom equation. [1]  The standard deviation of (measured) device Vt data is plotted against (1 / sqrt(Weff * Lgate)):

(sigma_Vt)**2 =  (A**2) / 2 * Weff * Lgate 

       where A is a “fitting” constant for the process

Essentially , as the square root of the channel area of the device is increased, the sigma-Vt decreases.  (Consider N devices in parallel with independent Vt variation – the Vt mean of the total will be the mean of the Vt distribution, while the effective standard deviation is reduced.)  The Pelgrom plot for the technology is an indication of the achievable statistical process control – more on Vt variation shortly.

For planar CMOS technologies, Vt variants from the baseline device were fabricated using a (low impurity dose) implant into the channel region.  A rather straightforward Vt implant mask lithography step was used to open areas in the mask photoresist for the implant.  For an implant equivalent to the background substrate/well impurity type, the device Vt would be increased.  The introduction of an implant step modifying the background concentration would increase the Vt variation, as well.

With the introduction of FinFET channel devices, the precision and control of implant-based Vt adjusts became extremely difficult.  The alternative pursued for these advanced (high-K gate oxide, metal gate) process nodes is to utilize various gate materials, each with a different metal-to-oxide workfunction contact potential.

Vt offerings for advanced nodes
As device scaling continues, workfunction metal (WFM) engineering for Vt variants is faced with multiple challenges.  A presentation at the recent VLSI 2020 Symposium by TSMC elaborated upon these challenges, and highlighted a significant process enhancement to extent multi-Vt options for nodes below 7nm. [2]

The two principal factors that exacerbate the fabrication of device Vt’s at these nodes are shown in the figures below, from the TSMC presentation.

  • The scaling of the device gate length (shown in cross-section in the figure) requires that the WFM deposition into the trench be conformal in thickness, and be thoroughly removed from unwanted areas.
  • Overall process scaling requires aggressive reduction in the nFET to pFET active area spacing.  Lithographic misalignment and/or non-optimum WFM patterning may result in poor device characteristics – the figure above illustrates incomplete WFM coverage of the (fin and/or GAA) device.

Parenthetically, another concern with the transition to GAA device fabrication is the requirement to provide a conformal WFM layer on all side of each (horizontal) nanosheet, without “closing off” the gap between sheets.

The TSMC presentation emphasized the diverse requirements of HPC, AI, 5G comm., and mobile markets, which have different top priorities among the PPA tradeoffs.  As a result, despite the scaling challenges listed above, the demand for multi-Vt cell libraries and PPA optimization approaches remains strong.  TSMC presented extremely compelling results of their WFM fabrication engineering focus.  The figure below illustrates that TSMC has demonstrated a range of Vt offerings for sub-7nm nodes, wider than 7nm.  TSMC announced an overall target Vt range exceeding 250mV.  (Wow.)

In addition to the multi-Vt data, TSMC provided corresponding analysis results for the Vt variation (Pelgrom plot) and the time-dependent device breakdown (TDDB) reliability data – see the figures below.

The sigma-Vt Pelgrom coefficient is improved with the new WFM processing, approaching the 7nm node results.  The TDDB lifetime is also improved over the original WFM steps.

The markets driving the relentless progression to advanced process nodes have disparate performance, power, and area goals.  The utilization of multi-Vt device and cell library options has become an integral design implementation approach.  The innovative process development work at TSMC continues this design enablement feature, even extending this capability over the 7nm node – that’s pretty amazing.

For more information on TSMC’s advanced process nodes, please follow this link.

-chipguy

References
[1]  ] M. J. M. Pelgrom, C. J. Duinmaijer, and A. P. G. Welbers, “Matching properties of MOS transistors”, IEEE J. Solid-State Circuits, vol. 24, no. 5, pp. 1433–1440, Oct. 1989.

[2]  Chang, Vincent S., et al., “Enabling Multiple-Vt Device Scaling for CMOS Technology beyond 7nm Node”, VLSI Symposium 2020, Paper TC1.1.

Images supplied by the VLSI Symposium on Technology & Circuits 2020.

 


Effect of Design on Transistor Density

Effect of Design on Transistor Density
by Scotten Jones on 05-26-2020 at 10:00 am

TSMC N7 Density Analysis SemiWiki

I have written a lot of articles looking at leading edge processes and comparing the process density. One comment I often get are that the process density numbers I present do not correlate with the actual transistor density on released products. A lot of people want to draw conclusions an Intel’s processes versus TSMC’s processes based on Apple cell phone application processors versus Intel microprocessors, this is not a valid comparison! In this article I will review the metrics I use for transistor density and why I use them and why comparing transistor density on product designs is not valid.

The first comment I want to make is that I am not a circuit designer and therefore I am not familiar with all of the aspects of the decisions that go into creating a design that may impact the transistor density of the final product, but I do have an understanding of the difference in density that can occur across a given process.

Logic designs are made up of standard cells and the size of the standard cells is driven by 4 parameters, metal two pitch (M2P), track height (TH), contacted poly pitch (CPP) and single diffusion break (SDB) versus double diffusion break (DDB).

Cell Height
The height of a standard cell is the metal two pitch (M2P) multiplied by the number of tracks (Track Height or TH). In recent years in order to continue to shrink standard cells the TH has been reduced while simultaneously reducing M2P as part of something called design technology co-optimization (DTCO). One key aspect of reducing TH is that the number of fins per transistor must be reduced at low THs due to space constraints, this is called fin depopulation. If you reduce the number of fins per transistor you get less drive current from each transistor unless you do something else to compensate for it such as increasing fin height, therefore DTCO.

Cell Width
The width of a standard cell depends on contacted poly pitch (CPP), whether the process supports single diffusion break (SDB) or double diffusion break (DDB) and the type of cell. For example, a NAND Gate is 3 CPPs in width with a SDB and 4 CPPs in width with a DDB. On the other hand, a scanned flip flop (SFF) cell might be something like 19 CPPs wide with a SDB and 20 CPPs wide with a DDB (this can vary with SFF designs). As you can see the effect on SDB versus DDB has more affect on a NAND Cell size than on a SFF cell.

Cell Options
When discussing process density, I always compare the minimum cell size, but processes offer multiple options. For example, TSMC’s 7nm 7FF process offers a minimum cell that is a 6-track cell with 2 fins per transistor and a 9-track cell with 3 fins per transistor. The 9-trcak cell offers 1.5x the drive current as the 6-track cell but is also 1.5x the size. This illustrates one of the problems when comparing two product designs to each other as a way of characterizing transistor density, a high performance design would have more 9-track cells and therefore lower transistor density than a design targeted at minimum size or lower power with 6-track cells on the same process. Even the preponderance of NAND cells versus SFF cells would affect the transistor density.

Figure 1 summarize the density difference between 6-track and 9-track cells on the TSMC 7FF process. Please note the MTx/mm2 parameter is the million transistor per millimeter squared based on 60% NAND cells and 40% SFF cells.

Figure 1. TSMC 7FF Density Analysis

 An interesting observation from figure 1 is that a minimum area SFF cell has over 2x the transistor density of a high-performance NAND cell on the same process.  There are also many other types of standard cells with varying transistor densities.

Memory Array
Most system on a chip (SOC) circuits contain significant SRAM memory arrays, in fact it is not unusual for over half the die area to be SRAM array.

The 7FF process offer a high density 6-transistor (6T) SRAM cell that is 0.0270 microns squared in area and that works out it 222 MTx/mm2. In theory a lot of memory array area on a design could result in higher transistor density, however, as with a lot of things related to comparing process density it isn’t that simple.

While doing a project for a customer I analyzed 3 TSMC SRAM test chips and embed SRAM arrays in 4 Intel chips and 1 AMD chips. The SRAM arrays were on average 2.93x the size you would expect based on the SRAM cell size for the process and the bit capacity of the array. This is presumably due to interconnect and circuitry to access the memory. If we base transistor density for SRAM on the SRAM cells in the array the density drops to 75.84 MTx/mm2 although there are certainly some transistor in the access circuitry that this isn’t counting.

Other Circuits
Certain SOC designs may also include analog, I/O and other elements that have significantly lower transistor density than minimum cells.

Conclusion
The bottom line to all this is that if you could implement the same design, say an ARM core with the same amount of SRAM into different processes you could use actual designs to compare process density, but since that isn’t available then some type of representative metric that can be consistently applied is needed. When I compare processes, I compare transistor density for a minimum size logic cell with a 60% NAND cell/40% SFF cell ratio. This is not a perfect metric but compares processes under the same condition. I also want to mention that for processes that are in production my calculations are based on dimensions measured on the product, typically by TechInsights and are not based on information from the individual companies I am covering. I do use information from the company announcements when estimating future process density.

Also Read:

Cost Analysis of the Proposed TSMC US Fab

Can TSMC Maintain Their Process Technology Lead

SPIE 2020 – ASML EUV and Inspection Update


Cost Analysis of the Proposed TSMC US Fab

Cost Analysis of the Proposed TSMC US Fab
by Scotten Jones on 05-19-2020 at 10:00 am

TSMC US Fab SemiWiki

On May 15th TSMC “announced its intention to build and operate an advanced semiconductor fab in the United States with the mutual understanding and commitment to support from the U.S. federal government and the State of Arizona.”

The fab will run TSMC’s 5nm technology and have a capacity of 20,000 wafers per month (wpm). Construction is planned to start in 2021 and production is targeted for 2024. Total spending on the project including capital expenditure will be $12 billion dollars between 2021 and 2029.

This announcement is undoubtedly the result of intense pressure on TSMC by the US government and it is also coming out today that TSMC will stop taking orders from Huawei also under pressure from the US.

What does this fab announcement mean?

This announcement is in my opinion soft, “Intention to build”, “construction planned to start”, “production targeted”. The project is based on a “mutual understanding and commitment to support from the U.S. federal government and the State of Arizona”, What happens if Donald Trump is voted out in November or just changes his mind? I could easily see this project never materializing due to changes in the US political situation or lack of follow-through from TSMC who is likely not excited about it to begin with.

My company IC Knowledge LLC is the world leader of cost and price modeling of semiconductors and MEMS. I thought it would be interesting to use our Strategic Cost and Price Model to make some calculations around this fab.

TSMC operates four major 300mm manufacturing sites in Taiwan and one in China. The four sites in Taiwan are all GigagFab sites, Fab 12, Fab 14, Fab 15 and Fab 18 are each made up of 6 or 7 wafers fabs sharing central facility plants. This Gigafab approach is believed to reduce construction costs by about 25% versus building a single stand-alone fab. The china fab location is smaller with 2 fabs at one location but the fab was equipped with used equipment transferred from fabs in Taiwan because the fab is trailing edge. If TSMC really builds a single US fab running 20,000 wpm the resulting cost to produce a wafer will be roughly 1.3% higher than for a GigaFab location due to higher construction costs. I believe it is unlikely the site will be equipped with used equipment transferred from Taiwan. The cost to build and equip the fab for 20,000 wpm should be approximately $5.4 billion dollars.

Locating a fab in the US versus Taiwan will result in the fab incurring US labor and utility costs, this will add approximately 3.4% to the wafer manufacturing cost.

The capacity of the fab is also smaller than a “typical” fab at advanced nodes, the three 5nm fabs TSMC is operating or planning for Taiwan are all 30,000 wpm. A 20,000 wpm fab will have an approximately 3.8% increase in costs versus a 30,000 wpm fab under the same conditions.

In total, wafers produced at the TSMC Arizona fab will be approximately 7% more expensive to manufacturer than a wafer made in Fab 18 in Taiwan. This does not account for the impact of taxes that are likely to be higher in the US than in Taiwan.

In the announcement TSMC has said the total spending on the project between 2021 and 2029 would be $12 billion dollars. That leaves money for a future expansion or conversion to 3nm. That would be almost enough money to add a second 20,000 wpm fab running 3nm as one possible example.

In summary the “announced” fab would likely be TSMC’s highest cost production site. It will be interesting to see if the fab materializes.

Also Read:

Can TSMC Maintain Their Process Technology Lead

SPIE 2020 – ASML EUV and Inspection Update

SPIE 2020 – Applied Materials Material-Enabled Patterning


TSMC’s Advanced IC Packaging Solutions

TSMC’s Advanced IC Packaging Solutions
by Herb Reiter on 05-01-2020 at 10:00 am

Fig 3 TSMC Adv Pkg blog

TSMC as Pure Play Wafer Foundry
TSMC started its wafer foundry business more than 30 years ago. Visionary management and creative engineering teams developed leading-edge process technologies and their reputation as trusted source for high-volume production. TSMC also recognized very early the importance of building an ecosystem – to complement the company’s own strengths. Their Open Innovation Platform (OIP) attracted many EDA and IP partners to contribute to TSMC’s success, all following Moore’s Law, to 3 nm at this time, to serve very high-volume applications.

Markets need Advanced IC Packaging technologies
For many other applications Moore’s Law is no longer cost-effective, especially not for integration of heterogeneous functions. “Moore than Moore” technologies, like Multi-chip modules (MCMs) and System in Package (SiP) have become alternatives for integrating large amounts of logic and memory, analog, MEMS, etc. into (sub)system solutions. However, these methodologies were and still are very customer specific and incur significant development time and cost.

In response to market needs for new multi-die IC packaging solutions, TSMC has developed, in cooperation with OIP partners, advanced IC packaging technologies to offer economical solutions for More than Moore integration.

TSMC as supplier of Advanced IC Packaging solutions
In 2012 TSMC introduced, together with Xilinx, the by far largest FPGA available at that time, comprised of four identical 28 nm FPGA slices, mounted side-by-side, on a silicon interposer. They also developed through-silicon-vias (TSVs), micro-bumps and re-distribution-layers (RDLs) to interconnect these building blocks. Based on its construction, TSMC named this IC packaging solution Chip-on-Wafer-on-Substrate (CoWoS). This building blocks-based and EDA-supported packaging technology has become the de-facto industry standard for high-performance and high-power designs. Interposers, up to three stepper fields large, allow combining multiple die, die-stacks and passives, side by side, interconnected with sub-micron RDLs. Most common applications today are combinations of a CPU/GPU/TPU with one or more high bandwidth memories (HBMs).

In 2017 TSMC announced the Integrated FanOut technology (InFO). It uses, instead of the silicon interposer in CoWoS, a polyamide film, reducing unit cost and package height, both important success criteria for mobile applications. TSMC has already shipped tens of millions of InFO designs for use in smartphones.

In 2019 TSMC introduced the System on Integrated Chip (SoIC) technology. Using front-end (wafer-fab) equipment, TSMC can align very accurately, then compression-bond designs with many narrowly pitched copper pads, to further minimize form-factor, interconnect capacitance and power.

Figure 1 shows that CoWoS technology is targeting Cloud, AI, Networking, Datacenters and other high-performance and high-power computing applications.

InFO serves some of these and a broad range of other, typically more cost-sensitive and lower power markets.

SoIC technology offers multi-die building blocks for integration in CoWoS and/or InFO designs. – see Figure 2.

SoIC technology benefits
TSMC’s latest innovation, the SoIC technology is a very powerful way for stacking multiple dice into a “3D building block” (a.k.a. “3D-Chiplet”). Today SoICs enable about 10,000 interconnects per mm2 between vertically stacked dice. Development efforts towards 1 Million interconnects per mm2 are ongoing. 3D-IC enthusiasts, including myself, have been looking, for an IC packaging methodology that enables such fine-grain interconnects, further reduces form-factor, eliminates bandwidth limitations, simplifies heat management in die stacks and makes integrating large, highly parallel systems into an IC package practical. As its name – System on IC – suggests, this technology meets these challenging requirements. The impressive capabilities of SoIC and SoIC+ are further explained here. TSMC’s EDA partners are working on complementing this technology with user-friendly design methodologies. I expect IP partners to offer soon SoIC ready chiplets and simulation models for user-friendly integration into CoWoS and InFO designs.

Personal comment: More than 20 years ago, in my alliance management role at Synopsys, I had the opportunity to contribute to Dr. Cliff Hou’s pioneering development work on TSMC’s initial process design kits (PDKs) and reference design flows, to facilitate the transition from the traditional IDM to the much more economical fabless IC vendor business model.

With the above described packaging technologies, TSMC is pioneering another change to the semiconductor business. CoWoS, InFO and especially SoIC enable semiconductor and system vendors to migrate from today’s lower complexity (and lower value) component-level ICs, to very high complexity and high value system-level solutions in IC packages. Last, but not least, these three advanced IC packaging solutions are accelerating an important industry trend: A big portion of the IC and system value creation is shifting from the die to the package.


Can TSMC Maintain Their Process Technology Lead

Can TSMC Maintain Their Process Technology Lead
by Scotten Jones on 04-29-2020 at 10:00 am

TSMC Process Lead Slides 20200427 Page 1

Recently Seeking Alpha published an article “Taiwan Semiconductor Manufacturing Company Losing Its Process Leadership To Intel” and Dan Nenni (SemiWiki founder) asked me to take a look at the article and do my own analysis. This is a subject I have followed and published on for many years.

Before I dig into specific process density comparisons between companies, I wanted to clear up some misunderstandings about Gate All Around (GAA) and Complimentary FET (CFET) in the Seeking Alpha article.

Gate All Around (GAA)
Just as the industry switched from planar transistors to FinFETs, it has been known for some time that a transition from FinFETs to something else will eventually be required to enable continued shrinks. A FinFET has a gate on three sides providing improved electrostatic control of the devices channel compared to a planar transistor that has a gate on only one side. Improved electrostatic control provides lower channel leakage and enables shorter gate lengths. FinFETs also provide a 3D transistor structure with more effective channel width per unit area than planar transistors therefore providing better drive current per unit area.

It is well established that a type of GAA device – horizontal nanosheets (HNS) are the next step after FinFETs. If the nanosheets are very narrow you get nanowires and significantly improved electrostatics. The approximate limit of gate length for a FinFET is 16nm and for a horizontal nano wire (HNW) is 13nm, see figure 1. Shorter gate lengths are a component of shrinking Contacted Poly Pitch (CPP) and driving greater density.

Figure 1. Contacted Poly Pitch CPP Scaling Challenges.

Please note that in Figure 1, the 3.5nm TSMC HNW is just an example of how dimensions might stack up, we know they are doing FinFETs at 3nm.

The problem with a HNW is that the effective channel width is lower than it is for a FinFET in the same area. The development of HNS overcame this problem and can offer up to 1.26x the drive current of FinFETs in the same area although they sacrifice some electrostatic control to do it, see figure 2.

Figure 2. Logic Gate All Around (GAA).

Another advantage of HNS is the process is essentially a FinFET process with a few changes. This is not meant to understate the difficulty of the transition, the HNS specific steps are critical steps and the geometry of a HNS will make creating multiple threshold voltages difficult, but it is a logical evolution of FinFET technology. Designers are used to FinFETs with 4 and 5 threshold voltages available to maximize the power – performance trade off, going back to one or two threshold voltages would be a problem, this is still an area of intense HNS development and needs to be solved for wide adoption.

At the “3nm” node Samsung has announced a GAA HNS they call a Multibridge, TSMC on the other hand is continuing with FinFETs. Both technologies are viable options at 3nm and the real question should be who delivers the better process.

Complementary FETs (CFET)
In the Seeking Alpha article there is a comment about a CFET offering 6x the density of a 3 fin FinFET cell, that isn’t how it works and in fact the comparison doesn’t even make sense.

Logic designs are made up of standard cells, the height of a standard cell is given by metal 2 pitch (M2P) multiplied by the number of tracks. A recent trend is Design Technology Co Optimization (DTCO) were in order to maximize shrinks the number of tracks has been reduced at the same time as M2P. In a 7.5 track cell, it is typical to have 3 fins per transistor but as we have transition to 6 track cells available at 7nm from TSMC and 5nm from Samsung, the fins per transistor is reduced to 2 due to spacing constraints. In order to maintain drive-current the fins are typically taller and optimized in other ways. As the industry moves to 5 track cells, the fins per transistor will be further reduced to 1.

Figure 3. Standard Cell layouts

CFETs are currently being developed as a possible path to continue to scale beyond HNS. In a CFET an nFET and pFET are stacked on top of each other as HNS of different conductivity types. In theory CFETs can scale over time by simply stacking more and more layers and may even allow lithography requirements to be relaxed but there is a long list of technical challenges to overcome to realize even a 2 deck CFET. Also, due to interconnect requirements going from a HNS to a 2 Deck CFET is approximately a 1.4x to 1.6x density increase, not 2x as might be expected. For the same process node, a 2 deck CFET would likely offer a less that 2x density advantage over an optimized FinFET, not 6x as claimed in the Seeking Alpha article.

2019 Status
In 2019 the leading logic processes in production were Intel’s 10nm process, Samsung’s 7nm process and TSMC’s 7nm optical process (7FF). Figure 5 compares the three processes.

Figure 4. 2019 Processes.

In figure 4, M2P is the metal 2 pitch as previously described, tracks are the number of tracks and cell height is M2P x Tracks. CPP is the contacted poly pitch and SDB/DDB is whether the process has a single diffusion break or double diffusion break. The width of a standard cell is some number of CPPs depending on the cell type and then DDB adds additional space versus a SDB at the cell edge. The transistor density is a weighted average of transistor density based on a mix of NAND cells and Scanned Flip Flop cells in a 60%/40% weighting. In my opinion this is the best metric for comparing process density, it isn’t perfect, but it takes designs out of the equation. A lot of people look at an Intel Microprocessor designed for maximum performance and compare the transistor density to something like an Apple Cell Phone Process with a completely different design goal and that simply doesn’t provide a process to process comparison under the same conditions.

It should be noted here that Samsung has a 6nm process and TSMC has a 7FFP that both increase the transistor density to around 120MTx/mm2, In the interest of clarity I am focusing on the major nodes.

2020 Status
At the end of 2019, Samsung and TSMC both began risk production of 5nm processes and both processes are in production in 2020.

5nm is where TSMC really stakes out a density lead, TSMC’s 5nm process has a reported 1.84x density improvement versus 7nm whereas Samsung’s 5nm process is only a 1.33x density improvement. Figure 5 compares Intel’s 10nm process to Samsung and TSMC’s 5nm processes since 10nm is still Intel’s densest process in 2020.

Figure 5. 2020 Processes.

The values for Samsung in figure 5 are all numbers that Samsung has confirmed. The TSMC M2P is an incredible 28nm, a number we have heard rumored in the industry. The rest of the numbers are our estimates to hit the density improvement TSMC has disclosed.

Clearly TSMC has the process density lead at the end of 2020.

2021/2022
Now the situation gets fuzzier, Intel’s 7nm process is due to start ramping in 2021 with a 2.0x shrink. Samsung and TSMC are both due to begin 3nm risk starts in 2021. Assuming Intel hits their date, they may briefly have a production density advantage but Intel’s 14nm and 10nm process have both been several years late. With COVID 19 impacting the semiconductor industry in general and the US in particular, a 2021 production date for Intel may be even less likely.

Figure 6 compares 2021/2022 processes assuming that within plus or minus a quarter or two all three processes will be available, I believe this is a fair assumption. Intel has said their density will be 2.0x 10nm, TSMC on their 2020-Q1 conference call said 3nm will be 70% denser than 5nm so presumably 1.7x, Samsung has said 3nm reduce the die size by 35% relative to 5nm and that equates to a approximately 1.54x denisty.

In order to make Intel’s numbers work I am assuming an aggressive 26nm M2P with 6 tracks, an aggressive 47nm CPP for a FinFET and SDB.

For Samsung they have disclosed to SemiWiki a 32nm M2P for 4nm and I am assuming they maintain that for 3nm with a 6-track cell. For CPP with the change to a GAA HNS, they can achieve 40nm and SDB.

In the case of TSMC they are shrinking 1.7x off of a 5nm process that is a 1.84x shrink from 7nm and they are bumping against some physical limits. With them staying with a FinFET I don’t expect the CPP to be below 45nm for performance reasons and even with SDB they will have to have a very aggressive cell height reduction. By implementing a buried power rail (BPR) they can get to a 5-track cell, BPR is a new and difficult technology and then an M2P of 22nm is required. Frankly such a small M2P raises issues with lithography and line resistance and BPR is also aggressive so I think this process will be incredibly challenging but TSMC has an excellent track record of execution.

Figure 6 summarizes the 2021/2022 process picture.

Figure 6. 2021/2022 Processes.

Some key observations from figure 6.

  1. The individual numbers in figure 6 are our estimates and may need to be revised as we get more information, but the overall process densities match what the companies have said and should be correct.
  2. In spite of being the first to move to HNS, Samsung’s 3nm is the least dense of the three processes. The early move to HNS may make it easier for Samsung to shrink in the future but for their 3nm node isn’t providing the density advantage that you might expect from HNS.
  3. Yes Intel is doing a 2.0x shrink and TSMC only a 1.7x shrink, but TSMC is doing a 1.84x shrink from 7nm to 5nm and then a 1.7x shrink from 5nm to 3nm in roughly the same time frame that Intel is doing a 2.0x shrink from 10nm to 7nm. A 1.7x shrink on top of a 1.84x shrink is a huge accomplishment, not a disappointment.

What’s Next
Beyond 2021/2022 I expect Intel and TSMC to both adopt HNS and Samsung to produce a second generation HNS. This will likely be followed by CFETs around 2024/2025 from all three companies. All of these confirmed numbers and projections come from the IC Knowledge – Strategic Cost and Price Model. The Strategic Cost and Price Model is not only a company specific roadmap of logic and memory technologies into the mid to late 2020s, it is also a cost and price model that produces detailed cost projections as well as material and equipment requirements.

Interested readers can see more detail on the Strategic Cost and Price Model here.

Conclusion
TSMC took the process density lead this year with their 5nm process. Depending on the exact timing of Intel’s 7nm process versus TSMC 3nm Intel may briefly regain a process density lead but TSMC will quickly pass them with their 3nm process with over 300 million transistors per millimeter squared!

Also Read:

SPIE 2020 – ASML EUV and Inspection Update

SPIE 2020 – Applied Materials Material-Enabled Patterning

LithoVision – Economics in the 3D Era


Tracing Technology’s Evolution with Patents

Tracing Technology’s Evolution with Patents
by Arabinda Das on 04-23-2020 at 10:00 am

Figure 1

We live in an age of abundant information. There is a tremendous exchange of ideas crisscrossing the world enabling new innovative type of products to pop up daily. Therefore, in this era there is a greater need to understand competitive intelligence. Corporate companies today are interested in what other competitors are brewing in their R&D labs and in predicting what novel application is coming up in the market so as to determine the best possible plan of action to counterattack. Moreover, new players with radically innovative ideas are rapidly emerging as partly deduced from the massive shift in the patent filing scenario in the past years. For example, in 2000, the three countries which filed the most patents were US, Japan and Germany. But since 2019, China has become the largest patent filing country with World Intellectual Property Organization (WIPO), surpassing USA, Japan, and Germany. South Korea has also emerged as a top five patent producers [1]. Companies around the world are looking for a synthesis of information from this data deluge. They are relying on industry experts to provide the technological know-how but also on patent engineers or analysts to perform the analysis of intellectual property (IP) of a particular company and/or a whole industry. Their aim is to understand the activities of the main players as well as the fields in which they dominate. Creating such a detailed patent landscape is time-consuming and complex, however, the end result could provide deep insights into the technology and the market.

I have come across several thorough patent landscapes that have predicted emerging technologies quite accurately. However, I have found mixed results for semiconductor road maps especially those related to advanced logic devices. Specifically, some of the major technologically break-through concepts in advanced logic devices were not predicted in time by market analysts or industry experts. The most striking example is the introduction of finFET device (a tri-gate where the gate wraps around the silicon fin for better control of the channel) by Intel in 2012 for its i5-3550 processor which arrived completely as a surprise to the industry.

The story gets even more interesting after the introduction of finFET devices. Very quickly there were multiple reports that after 10 nm node finFET devices were not going to be extendable. Solutions were proposed in public forums like IEEE papers, IEDM and VLSI conferences. Needless to say, prior to the publication of every proposed solution in a public literature, multiple patents related to them were filed by all major device manufacturers. All the patents and non-patent literature could be grouped into two categories: new materials or new device architectures. They discussed either new materials with existing technologies or suggested radical solutions where new device architectures were fabricated with new materials. For example, some of the serious propositions with prototype data were the following device structures: ultra-thin-body (UTB) field-effect-transistor (FET) based on silicon-on-insulator (SOI), gate-all-around (GAA) involving nano-wires/nano-sheets stacked horizontally or vertically, tunneling FET (TFET), and stacked FET. Meanwhile the materials section mainly focused on silicon -germanium (SiGe) replacing the silicon (Si) channel for PMOS or using III-V compounds. However, today, we are at 7 nm node and slowly transitioning to 5 nm node and still moving forward with the original finFET configuration.

I wondered why these predictions were inaccurate and came to the following conclusions. Firstly, all these suggested devices in spite of their strengths had some serious concerns too. The ultra-thin-body (UTB) architecture gave the possibility of back biasing and also had low consumption of power. The initial wafer cost was high then. UTB is now not used but SOI based technology is currently widely prevalent in the market despite not being used in high speed processors. Similarly the GAA concepts provided better electrostatic control of the channel but required two materials which could be deposited one top of each other, each of them having a very different etch selectivity for the same etching chemistry. The onus on deposition and etching was high, which made the overall process flow very expensive. Vertical GAA FET devices which required major integration change as the wire-shaped channel regions were perpendicular to the substrate (implying that source and drain regions were not on the same plane) were especially hindered by their requirements. This implied additional process steps involving deposition and etching which would make the manufacturing of advanced logic devices even more expensive. Regarding TFET, there was the promise of attaining the sub-threshold slope limit of 55mV/dec, which could open new applications for low power computing. However, the band gap tunneling based TFET devices unfortunately lacked a robust drive current. Next, let us consider stacked FET devices. This idea had been floating since a long time in the technical forum. In this concept, transistors are stacked one on top of another. Either the transistors are made in separate wafers and bonded or they are fabricated directly on the lower layer of transistors. This requires good bonding techniques or proper controlling of the thermal budget for the top devices. Additionally, controlling the implant process could be difficult on the stacked layer. Back in 2012, the solutions were not ready. What about SiGe replacing Si? Most of the patents filed and literature submitted highlighted two possible scenarios both of which involved integration methods post fin formation. One requires growing SiGe on the side walls, while the other is recessing the fins between the isolation structures and growing SiGe on top of the fin (see figure 1). Both methods required at least additional mask sets and numerous process steps, which suggested that the end result would be expensive.

If you observe the track history of semiconductor manufacturers it becomes evident why none of these concepts ever made it into the mainstream. The continuous miniaturization or scaling of the devices has maintained the transistor count trend in accordance with Moore’s law even today [2]. The scaling is actually the shrinkage of all the dimensions of metal-oxide-semiconductor field effect transistor (MOSFET). Every time the semiconductor manufacturers were faced with process challenges or design difficulties due to scaling, they analyzed what is the smallest change that could be made in the  integration scheme in order to continue to use the existing tool set and process flows in the new technology node. They also had to consider whether new processes that were to be introduced could be extended to future nodes. The strategy is that in every technology node when some new process-integration step is introduced, the majority of other process steps are kept unaltered. The direct result of this strategy is that with each coming generation the process-flow becomes more stable and reliable.

This strategy of minimum change for every new generation is well exemplified in Intel’s processors. Intel’s 22 nm had the 5th generation of strained silicon engineering with raised source-drain having embedded graded SiGe for PMOS channel, and embedded Si for NMOS. Similarly, for channel and gate engineering, high-k with replacement metal gates were introduced in 45 nm node and was further improved in 32 nm node and finally implemented in 22 nm finFET structure. Intel has maintained the same finFET architecture up to 10 nm. Yet the device performance has improved and the transistors per unit area count has increased. In the case of TSMC it is equally impressive, TSMC introduced finFET device at 16 nm node in the iPhone 7 processor in 2016, and since has produced three new generations of finFET devices. According to the press release, it will also continue to use finFET devices in their 5 nm devices [3].

Needless to say the devil is in the details; detailed structural analyses are needed to understand the process evolution. Even though finFET configuration has remained as the workhorse since 2012, the evolution of the integration process flow and the design layout are impressive. In a broad sense, maximum changes and new process steps in advanced logic nodes take place near the gate structure, especially in the lowest interconnect structure closest to the gate. A glimpse of the process sophistication can be deduced from an old presentation of Intel, along with Mr. Dick James’ comments of Intel’s 10 nm process which includes cross-sections and detailed explanations about the changes in contact formation [4]. This article highlights how by changing the layout and the integration scheme the standard cell could be reduced and thus increase the number of transistors per unit area. A detailed survey of technology process of finFETs starting from 14 nm to 10 nm is well collected in a presentation from Siliconics [5]. This presentation is full of cross-sections and detailed explanations, and is quite a treasure trove. It elaborates some of the major innovations that have been introduced in finFET devices. For examples, it discusses, fin geometry and pitches, work function metal layers of NMOS and PMOS transistors, solid-source diffusion punch stop and its role, the introduction of novel materials in the lower interconnect structure, the structure of dummy gates at the fin end, post patterning fin removal, the coming of super vias that connect directly from metal 1 to the gate without the need of an intermediate metal 0 layer, the implementation of multi-stage contacts to the source-drain regions, the introduction of quadruple patterning for the front-end, and air-gaps in the back-end-of line. Figure 2 taken from this presentation shows a variety of contacts, which is only one of the novelties in finFET devices. And of course each of these process steps is backed by a family of patents. This illustrates the point that massive innovations were implemented on the same finFET device configuration.

Predicting near future technologies for semiconductor devices would require looking for patents that make incremental changes yet affect the cell area or the layout of interconnect structure closest to the gate. These patents would be able to make the miniaturization process without much disruption while still maintaining the integration flow, thus keeping the manufacturing cost low. Modern technology will accelerate the process of using patents to more effectively predict the near future technologies of semiconductor devices. Related ideas are already being tried out with the help of deep learning as in the case of Google which announced that it is experimenting with artificial intelligence to make more efficient chips. It is not looking for radical changes in device structures but rather optimizing what is available [6]. Semiconductor technology has never stopped innovating and will not stop surprising us and a thorough understanding of current process steps and their corresponding patents could be key to predicting what is still to come.

The ideas expressed in this article are solely the opinion of the author and do not represent the author’s employer or any other organization with which the author may be affiliated.

References

1/ https://twitter.com/WIPO/status/1247498105135566848

2/ https://www.semiconductor-digest.com/2020/03/10/transistor-count-trends-continue-to-track-with-moores-law/

3/ https://www.tsmc.com/english/dedicatedFoundry/technology/5nm.htm

4/ https://newsroom.intel.com/newsroom/wp-content/uploads/sites/11/2017/09/10-nm-icf-fact-sheet.pdf

https://sst.semiconductor-digest.com/chipworks_real_chips_blog/2017/04/10/intel-unveils-more-10nm-details/

5/ https://nccavs-usergroups.avs.org/wp-content/uploads/JTG2018/JTG718-4-James-Siliconics.pdf

6/ https://www.zdnet.com/article/google-experiments-with-ai-to-design-its-in-house-computer-chips/


TSMC COVID-19 and Double Digit Growth in 2020

TSMC COVID-19 and Double Digit Growth in 2020
by Daniel Nenni on 04-17-2020 at 10:00 am

Mark Liu CC Wei TSMC


TSMC has had an incredible run since its founding in 1987 which spans most of my 36 year semiconductor career. Even in these troubled times TSMC is a shining bellwether with double digit growth expectations while the semiconductor industry will be flat or slightly down. Let’s take a close look at the TSMC Q1 2020 conference call and see what else we can learn.

“On March 18, we found one employee who tested positive for COVID-19 and immediately began receiving appropriate care. Today, this employee has recovered, is out of the hospital and is staying at home for additional quarantine. We were able to suitably trace all the other individuals who were in contact. The neighboring employees have all tested negative, while all other employees who were in contact has entered and completed the 14-day self-quarantine and now back to work. As a result of the strict preventive measures taken by TSMC, we have not seen any disruption of our fab operations so far.”

This does not surprise me at all. Taiwan learned a very important lesson during the SARS outbreak in 2002. I remember traveling during this time and going through extra medical checks at the TPE airport. Taiwan installed medical imaging equipment that took our temperatures after we got off the planes. It is easy to remember since I had to remove my hat and got to see how big my brain is. It really is big, hat size XL.

One thing you can say about TSMC is that they have built their business on experience and humility, absolutely.

Dr. C.C. Wei:

“Looking ahead to the second half of this year. Due to the market uncertainty, we adopt a more conservative view as we expect COVID-19 to continue to bring some level of disruption to the end market demand. For the whole year of 2020, we now forecast the overall semiconductor market, excluding memory growth, to be flattish to slightly decline, while foundry industry growth is expected to be high single-digit to low-teens percentage.”

In my opinion we will see a hockey-stick-like semiconductor recovery in Q4 2020. Never before have we seen the entire world united in a common cause. Never before have we seen such worldwide compassion and cooperation. COVID-19 really is a globally uniting event and it could not have come at a better time in my opinion. The world will be a much safer and more productive place in 2021 and beyond, that is my heartfelt belief.

“Now let me talk about the progress and development of 5G and HPC. With the recent disruption from COVID-19, we now expect global smartphone units to decline high single digit year-over-year in 2020. However, 5G network deployment continues and OEMs continue to prepare to launch 5G phones. We maintain our forecast for mid-teens penetration rate for 5G smartphone of the total smartphone market in 2020.”

It is understandable that the edge devices will take a pause this year but remember we are in a data driven society. With the entire world sheltering in place the amount of data generated is increasing exponentially. SemiWiki traffic alone is up 30%. Our webinar series is breaking registration and attendance records. The world wide communications infrastructure is being upgraded like never before and that means semiconductor strength.

There has been a lot of fake news of late surrounding the TSMC process technology so let’s get this straight from the horse’s mouth (American idiom for the truth):

“Now let me talk about the ramp-up of N7, N7+ and the status of N6. In its third year of ramp, N7 continue to see very strong demand across a wide spectrum of products for mobile, HPC, IoT and automotive applications. Our N7+ is entering its second year of ramp using EUV lithography technology while paving the way for N6. Our N6 provides a clear migration path for next-wave N7 products, as the design rules are fully compatible with N7.”

“N6 has already entered its production and is on track for volume production before the end of this year. N6 will have one more EUV diode than N7+ and will further extend our 7-nanometer family well into the future. We expect our 7-nanometer family to continue to grow in its third year and reaffirm it will contribute more than 30% of our wafer revenue in 2020.”

“Now let me talk about our N5 status. N5 is already in volume production with good yield. Our N5 technology is a full node stride from our N7, with 80% logic density gain and about 20% speed gain compared with N7. N5 will adopt EUV extensively. We expect a very fast and smooth ramp of N5 in the second half of this year driven by both mobile and HPC applications. We’ll reiterate 5-nanometer will contribute about 10% of our wafer revenue in 2020.”

“N5 is the foundry industry’s most advanced solution with best PPA. We observed a higher number of tapeouts, as compared with N7 at the same period of time. We will offer continuous enhancements to further improve the performance, power and density of our 5-nanometer technology solution into the future as well. Thus, we are confident that 5-nanometer will be another large and long-lasting node for TSMC.”

“Finally, I will talk about our N3 status. Our N3 technology development is on track, with risk production scheduled in 2021 and target volume production in second half of 2022. We have carefully evaluated all the different technology options for our N3 technology, and our decision is to continue to use FinFET transistor structure to deliver the best technology maturity, performance and costs.”

“Our N3 technology will be another full node stride from our N5, with about a 70% larger density gain, 10 to 15 speed gain and 25% to 30% power improvement as compared with N5. Our 3-nanometer technology will be the most advanced foundry technology in both PPA and transistor technology when it is introduced and will further extend our leadership position well into the future.”

If you have questions about this please post in the comments section and let the SemiWiki community of experts answer. Just say no to fake news….


TSMC 32Mb Embedded STT-MRAM at ISSCC2020

TSMC 32Mb Embedded STT-MRAM at ISSCC2020
by Don Draper on 03-20-2020 at 6:00 am

Fig. 1. Cross section of the STT MRAM bit cell in BEOL metallization layers between M1 and M5.

32Mb Embedded STT-MRAM in ULL 22nm CMOS Achieves 10ns Read Speed, 1M Cycle Write Endurance, 10 Years Retention at 150C and High Immunity to Magnetic Field Interference presented at ISSCC2020

1.  Motivation for STT-MRAM in Ultra-Low-Leakage 22nm Process

TSMC’s embedded Spin-Torque Transfer Magnetic Random Access Memory (STT-MRAM) offers significant advantages compared to Flash Non-Volatile Memory (NVM).  Flash requires 12 or more extra masks, is implemented in the silicon substrate and is page mode write alterable.  STT-MRAM on the other hand is implemented in the Back-End-Of-Line (BEOL) metallization as shown in Fig. 1, requires only 2-5 extra masks and is byte-alterable.

This implementation in TSMC’s 22nm Ultra-Low-Leakage (ULL) CMOS process has a very high read speed of 10ns, and read power of 0.8mA/MHz-bit. It has 100K cycle write endurance for 32Mb code and 1M cycle endurance for 1Mb data. It supports data retention for IR reflow at 260C of 90 seconds and 10 years data retention at 150C.  It is implemented in a  very small 1transistor-1resistor (1T1R) 0.046 mm2 bit cell and has a very low leakage current of 55mA at 25C for the 32Mb array equivalent to 1.7E-12A/bit when in Low Power Standby Mode (LPSM).  It utilizes a sensing scheme with per-sense amp trimming and 1T4R reference cell.

 

Fig. 1. Cross-section of the STT-MRAM bit cell in BEOL metallization layers between M1 and M5.

2.  1Transistor-1Resistor MRAM Bit Cell Operation and Array Structure
To reduce parasitic resistance on the write current path, a two-column common source line (CSL) array structure is employed as shown.

Fig. 2. Schematic of the 1T1R bit cell in the array of 512b column with the 2-column CSL

The word line is over-driven by a charge pump to provide sufficient switching current of hundred’s of mA for write operation requiring the unselected bit lines to be biased at a “write-inhibit voltage” (VINHIBIT) to prevent excess voltage stress on the access transistors of the  unselected columns of the selected row. To reduce bit line leakage of the access transistor on unselected word lines, the word line has a negative voltage bias (VNEG). The biasing of the array structure for reading, write-0 and write-1 is shown in Fig. 3.

Fig. 3. Cell array structure biasing for word lines and bit lines for read, write-0 and write-1 operations.

3.  Read Operation, Sense Amplifier and Word-Line Voltage System
For fast, low-energy wake-up from LPSM to enable high-speed read access, a fine-grained power gating circuit (one per 128 rows) with a two-step wakeup is used as shown in Fig. 4.  The power switch consists of two switches, one for the chip power supply VDD and the other for a regulated voltage from the Low Drop-Out (LDO) regulator supplying VREG.  The VDD switch is turned on first to pre-charge the WL driver’s power rail, then the VREG switch is turned on to raise the level to the targeted level, which achieves fast wake-up of <100ns while minimizing the transient current from VREG LDO.

Fig. 4. Fine-grained power gating circuit (one per 128 rows) with two-step wake-up.

The Tunnel Magnetoresistance Ratio (TMR) house curve shown in Fig. 5 is the ratio between the antiparallel resistance state Rap to the parallel resistance state Rp  as a function of voltage, showing lower TMR and smaller read window at higher temperatures.

Fig. 5 House curve of TMR showing the reduced window for read at 125C

The resistance distributions of the Rap and the Rstates which, when including the bitline metal resistance and the access transistor resistance, determine the total read-path resistance showing the proportional reduction in the difference between the two states which the sense amp needs to measure to determine the bit value, as shown in Fig. 6.

Fig. 6. Distribution of resistance values for the anti-parallel Rap and the parallel Rp states and including the metal bit line and access transistor resistances showing the proportional reduction in the difference between the two states that needs to be detected by the sense amp.

To sense the resistance of the MTJ, the voltage across it during read must be clamped by transistors N1 and N2 to a low value to avoid read-disturb  and is trimmed to cancel the sense amp and reference current offset. The reference resistance is formed by the 1T4R configuration  R~(Rp +Rap)/2  + R1T as shown in Fig. 7.

Fig. 7.  Sense amp with trimming capability showing the read clamp voltage on transistors N1 and N2  to prevent read disturb. Reference R~(Rp +Rap)/2  + R1T

This configuration is able to achieve a read speed of less than 10ns at 125C as shown in the sensing timing diagram and shmoo plot Fig. 8.

Fig. 8.  Sensing timing diagram and read access shmoo plot at 125C.

4.  MRAM write operation
MRAM write of the parallel low resistance state, Rp and the higher resistance anti-parallel state Rap requires bi-directional write operation shown in Fig. 9. To write the Rap state to the Rp requires biasing the Bl to VPP, the WL to VREG_W0 and the SL to 0 to write the 0 state.    To write the 1 state, writing the Rp  state  to the Rap  state  requires current in the other direction, with the BL at  0, the SL at VPP and the WL at VREG_W1.

Fig. 9. Bi-directional Write for the parallel low resistance state, Rp and the higher resistance anti-parallel state Rap

For data retention during IR reflow at 260C for 90sec, an MTJ with a high energy barrier Eb is needed. This requires an increase in the MTJ switching current to hundreds of mA needed for reliable writing.  The write voltage is temperature compensated and a charge pump generates a positive voltage for selected cells and a negative voltage for unselected word lines to suppress bit line leakage at high temperatures. The write voltage system is shown in Fig. 10.

Fig. 10 Showing the over-drive of the WL and BL/SL by the charge pump and the temperature compensated write bias

Temperature compensation for write voltage is required for operation with a wide temperature range.  The write voltage shmoos from -40C to 125C are shown in Fig. 11 where the F/P blocks show fail at -40C while passing at 125C.

Fig. 11. Showing requirement for temperature compensation during write.

A BIST module with standard JTAG interface implements self-repair and self-trimming to facilitate test flow. The memory controller TMC implementing the Double Error Correction ECC (DECECC) shown in Fig. 12.

Fig. 12. BIST and Controller for self-repair and self-trimming during test and implementing DECECC.

The TMC implements the smart write algorithm which implements bias setup and verify/retry time for high write endurance (>1M cycles). It contains read-before-write to decide which bits need to be written and dynamic group-write to improve write throughput, multi-pulse write with write verify and optimizes write voltage for high endurance. The algorithm is shown in Fig. 13.

Fig. 13. Smart write algorithm showing dynamic group write and multi-pulse write with write verify.

5.  Reliability Data, Key Features and Die Photo

Fig. 14.  The write endurance test shows that the 32Mb chip access times and the read currents are stable before and after 100K -40C write cycles.

Fig. 15.  The write endurance bit error rate is less than 1 ppm at -40C after 1M cycles.

Fig. 16. The increased thermal stability barrier Egoverning temperature dependence of data retention shows more than 10 years data retention at 150C, 1ppm.

Magnetic field interference is a potential concern in many applications for spin-based STT-MRAM. The solution is a 0.3mm thick magnetic shield deposited on the package as shown in Fig. 16 showing that in a field strength of 3500Oe of a commercial wireless charger for mobile devices the bit error rate of 100 hour exposure can be reduced from >1E6ppm to ~1ppm. Also, more than 10 years of data retention at 125C was shown at a magnetic field of 650 Oe.

Fig. 17. Sensitivity to a magnetic field of 3500 Oe reduced by a factor of 1E6.

Conclusions
The 22nm ULL 32Mb high-density MRAM has very low power, high read speed, very high data retention and endurance  suitable for a wide range of applications. With a cell size of only 0.0456mm 2 , it has a read speed of 10ns and a read power of 0.8 mA/MHz/b and in low-power standby mode (LPSB) it has leakage less than 55mA at 25C, equivalent to 1.7 E-12 A/bit leakage. For 32Mb code, it has an endurance of 100K cycles and for 1Mb data >1M cycles.  It has a capability of 90sec data retention under IR reflow at  260C and a long-term retention of > 10 years at 150C. The product spec is shown in Fig. 18 and die photo in Fig. 19.

Fig. 18.  Summary table of N22 MRAM specification and die photo.

Fig. 19.   32Mb high-density MRAM macro in the 22nm Ultra-Low-Leakage CMOS process.