Register File Design at the 5nm Node

Register File Design at the 5nm Node
by Tom Dillinger on 03-10-2021 at 2:00 pm

lowVt bitcell

“What are the tradeoffs when designing a register file?”  Engineering graduates pursuing a career in microelectronics might expect to be asked this question during a job interview.  (I was.)

On the surface, one might reply, “Well, a register file is just like any other memory array – address inputs, data inputs and outputs, read/write operation cycles.  Maybe some bit masking functionality to write a subset of the data inputs.  I’ll just use the SRAM compiler for the foundry technology.”  Alas, that answer will likely not receive any kudos from the interviewer.

At the recent International Solid State Circuits Conference (ISSCC 2021), TSMC provided an insightful technical presentation into their unique approach to register file implementation for the 5nm process node. [1]

The rest of this article provides some of the highlights of their decision and implementation tradeoffs.  I would encourage SemiWiki readers to obtain a copy of their paper and delve more deeply into this topic (particularly before a job interview).

Register File Bitcell Implementation Options

There are three general alternatives for selecting the register file bit cell design:

  • an array of standard-cell flip-flops, with standard cell logic circuitry for row decode and column mux selection

The figure above illustrates n registers built from flip-flops, with standard logic to control the write and read cycles (shown separately above) – one write port and two read ports are shown.

  • a conventional 6T SRAM bitcell

The figure above illustrates an SRAM embedded within a stdcell logic block, where the supply voltage domains are likely separate.  Additional area around the SRAM is required, to accommodate the difference between the conventional cell layout rules and the “pushed” rules for (large) SRAM arrays.

  • a unique bitcell design, optimized for register file operation

For the 5nm register file compiler, TSMC chose the third option using the bitcell illustrated above, based on the considerations described below.  Note that the 16-transistor cell includes additional support for masked bit-level write, using the additional CL/CLB inputs.  The TSMC team highlighted that this specific bit-write cell design reduces the concern with cell stability for adjacent bitcells on the active wordline that are not being written – the “half-select” failure issue (wordline selected, bit column not selected).

Bitcell Layout

The foundry SRAM compiler bitcell typically uses unique (aggressive) layout design rules, optimized for array density.  Yet, there are specific layout spacing and dummy shape transition rules between designated SRAM macros and adjacent standard cell logic – given the large number of register files typically present in an SoC architecture, this required transition area is inefficient.

Flip-flops use the conventional standard cell design layout rules, with fewer adjacency restrictions to adjacent logic.

For the TSMC 5nm register file bitcell, standard cell digital layout rules were also used.

Peripheral Circuitry

A major design tradeoff for optimal register file PPA is the required peripheral circuitry around the bitcell array.  There are several facets to this tradeoff:

  • complexity of the read/write access cycle

The flip-flop implementation shown above is perhaps the simplest.  All flip-flop outputs are separate signals, routed to multiplexing logic to select “column” outputs for a read cycle.  Yet, the wiring demand/congestion and peripheral logic depth grows quickly with the number of register file rows.

The SRAM uses dotted bitcell inputs and outputs along the bitline column;  the decoded row address is the only active circuit on the bitline.  A single peripheral write driver and differential read sense circuit supports the entire column.

The TSMC register file bitcell also adopts a dotted connection for the column, but separates the write and read bit lines.  The additional transistors comprising the read driver in the cell (P6, N6, P7, and N7 in the bitcell figure above) offer specific advantages:

  • the read output is full-swing, and static (while the pass gate N7/P7 is enabled)

No SRAM differential bitline precharge/discharge read access cycle is needed, saving power.  The read operation does not disturb the internal, cross-coupled nodes of the bitcell.

  • the read and write operations are independent

The use of separate WWL and RWL controls allows a concurrent write operation and read operation to the same (“write-through”) or different row.

Although based on digital standard cell design rules, note that the peripheral circuitry for the TSMC register file design needs some special consideration.  The read output transfer gate circuit presents a diffusion node at the bitcell boundary, with multiple dotted bitcell rows.  This node is extremely sensitive to switching noise, and requires detailed analysis.

Vt Selection

The choice of standard cell design rules also allows greater flexibility for the TSMC register file bitcell.  For example, low Vt devices could be selectively used in the read buffer for improved performance, with a minor impact on bitcell leakage current, as illustrated below.

VDD Operation

Perhaps the greatest register file implementation tradeoff pertains to the potential range of operating supply voltages available to foundry customers.  At advanced process nodes, the range of supply voltages needed for different target markets has increased.  Specifically, very low power applications require aggressive reductions in VDDmin – e.g., for the 5nm process node, logic functionality down to ~0.4-0.5V (from the nominal VDD=0.75V) is being pursued.

The use of standard cell design rules enables the register file implementation to scale the supply voltage with the logic library – indeed, the embedded register file can be readily integrated with other logic in the block in a single power domain.

Conversely, the traditional SRAM cell design at advanced nodes increasingly requires a “boost” during the write operation, to ensure sufficient design margin across a large number of memory bitcells, using aggressive design rules.  This write assist cycle enables a reduction in the static SRAM supply voltage, reducing the SRAM leakage current.  Yet, it also introduces considerable complexity to the access cycle with the charge-pump boost precursor (possibly even requiring a read-after-write operation to confirm the written data).

Write Power

Another comparison to a conventional SRAM bitcell worth mentioning is that the feedback loop in the TSMC register file bitcell is broken during the write operation.  (Most flip-flops circuits also use this technique.)  The write current overdrive used to flip the state of the SRAM bitcell with cross-coupled inverters dissipates greater power during this cycle.

Testsite and Measurement Data

The first figure below shows the 5nm register file testsite photomicrograph, with two array configurations highlighted.  The second figure illustrates the measured performance data for 4kb and 8kb register file macros, across VDD and temperature ranges.  Note the selection of a digital process design enables functional operation down to a very low VDDmin.

(Astute observers will note the nature of temperature inversion in the figure – operation at 0C is more limited than at 100C.)

The testsite macros also included DFT and BIST support circuitry – the test strategy (and circuit overhead) is definitely part of the register file implementation tradeoff decision.

Summary:  The Final Tradeoff

Like all tradeoffs, there is a range of applicability which much be taken into account.  for the case of register file implementation using either flip-flops, conventional SRAM bitcells, or a unique bitcell as developed by TSMC for the 5nm node, the considerations are:

  • area:  dense 6T SRAM cells with complex peripheral circuitry versus larger area cells (using digital design rules)
  • VDDmin support (power) and VDDmax capabilities (performance, reliability)
  • masked bit-write requirements
  • test methodology (e.g., BIST versus a simple scan chain through flip-flops)
  • and, last but certainly not least,
  • number of register file access ports (including concurrent read/write operation requirements)

The TSMC focus for their ISSCC presentation was on a 1W, 1R port architecture.  If more register file ports are needed, the other tradeoff assessments listed above change considerably.

The figure below illustrates the area tradeoff between an SRAM bitcell and the 5nm bitcell, indicating a “cross-over” point at ~40 rows (for 256 columns).  The 4kb (32×128) and 8kb (32×256) register file macros shown earlier fit with the preferred window for the fully digital bitcell design.

For reference, TSMC also shared this tradeoff for their previous 7nm register file design, as shown below (1W1R ports). [2]  Note the this figure also includes the lower range, where a flip-flop-based implementation is attractive.

Yet, as currently SoC architectures demand larger on-die local storage, the unique bitcell design in 5nm supporting optimum 4kb and 8kb macros hits the sweet spot.

Hopefully, this article will help you nail the register file design job interview question.   🙂

I would encourage you to read the TSMC papers describing their design approach and tradeoff assessments on 5nm (and 7nm) register file implementations.

-chipguy

References

[1]  Fujiwara, H., et al., “A 5nm 5.7GHz@1.0V and 1.3GHz@0.5V 4kb Standard-Cell-Based Two-Port Register File with a 16T Bitcell with No Half-Selection Issue”, ISSCC 2021, paper 24.4.

[2]  Sinangil, M., et al., “A 290mV Ultra-Low Voltage One-Port SRAM Compiler Design Using a 12T Write Contention and Read Upset Free Bitcell in 7nm FinFET Technology”, VLSI Symposium 2018.


TSMC Plans Six Wafer Fabs in Arizona

TSMC Plans Six Wafer Fabs in Arizona
by Scotten Jones on 03-10-2021 at 10:00 am

TSMC Fab 18 Remdering

There are reports in the media that TSMC is now planning six Fabs in Arizona (the image above is Fab 18 in Taiwan). The original post I saw referred to a Megafab and claimed six fabs with 100,000 wafers per month of capacity (wpm) for $35 billion dollars. The report further claimed it would be larger than TSMC fabs in Taiwan.

This report struck me as not reliable given that TSMC refers to their large fab clusters as Gigafabs not Megafabs and TSMC’s Fab 12, Fab 14, and Fab 15 each have capacity of around 300,000 wpm and Fab 18 is ramping to over 200,000 wpm.

Now similar reports are being repeated in more reputable sources, notably today I saw a report in EE News Europe that stated:

  • The site would be a Gigafab (correct terminology).
  • Filings with the city of Phoenix describe three phases of building.
  • TSMC has reportedly offered to double employee salaries to move to the US.

I am still not sure about the six fab part, the Phoenix documents are reported to say three phases although I suppose each phase could be two fabs. The other issue I have is that 100,000 wpm for six fabs is just under 17,000 wpm per fab, those are smaller fabs than TSMC typically builds and would be sub optimal from a cost perspective.

What I would think would be more likely is three fabs of just over 30,000 wpm each for a total of 100,000 wpm. Maybe they will build three fabs initially for 100,000 wpm and then have the option to build three more fabs later for an additional 100,000 wpm. Fab 18 in Taiwan has three fabs P1, P2 and P3 that are running 5nm with an original capacity of just under 30,000 wpm each although they are now being expanded to 40,000 wpm each, 120,000 wpm total. There is also P4, P5, and P6 under construction for 3nm that will likely each be around 30,000 wpm each initially bringing the site to around 200,000 wpm.

The $35 billion dollar price tag is high for 100,000 wpm of 5nm but would make sense if it also included some preparation for additional phases or 3nm capability. I should also point out the initial budget number for fabs is often an estimate and can increase or decrease as the fab is built depending on final capacity and how many fab phases are included in the initial amount. I believe TSMC has spent more money on phase 1, 2 and 3 of Fab 18 for 5nm than they originally announced and will also be spending more money on phases 4, 5 and 6 for 3nm than originally announced.

My best guess as of todays is the fab will have three phases initially producing 100,000 wpm total with the option to add three more phases in the futures to reach 200,000 wpm, that would be more consistent with TSMC Fab 18 in Taiwan.

However, the specifics work out it does appear that TSMC is now looking at building a full scale Gigafab in the US instead of the small fab originally planned. I see this as good news for the global semiconductor supply due to the high risk presented by having so much of the world’s leading edge logic capacity concentrated in Taiwan. This is especially concerning with Taiwan being located on an active fault line, the view in China that Taiwan is a rouge province that must be brought back under China control and the resource limits of a small island.

 


Chip Channel Check- Semi Shortage Spreading- Beyond autos-Will impact earnings

Chip Channel Check- Semi Shortage Spreading- Beyond autos-Will impact earnings
by Robert Maire on 03-07-2021 at 10:00 am

Robert Maire 2

– Semiconductor shortage is like toilet paper shortage in early Covid
– Panic buying, hoarding, double ordering will cause spike
– Could cause a year+ of dislocation in chip makers before ending
– Investors, Govt & Mgmt will get a wake up call from earnings hit

Auto industry is just a prominent tip of chip crunch iceberg. We believe the chip shortage is spreading across other industries

The automotive industry is just a very prominent, in your face, example of the semiconductor industry problem as it involves the highest financial impact ratio; That is that a 25 cent chip can stop the revenue associated with a $50,000 car.

Wait till we get the earnings report from Ford for Q1 and they have a significant revenue and earnings shortfall, due to the production halts, that they blame on those tech guys in California’s Silicon Valley.

From an investment perspective we think we will see similar revenue and earnings impact across a number of industries…not just tech related.

In the past we have seen delays in laptops and servers which were relatively common. Last year I ordered a laptop that was delayed two months due to “production problems” (AKA chip shortage).

We would expect chip shortages to hit telecommunications equipment makers; everything from 5G to routers etc. Video cards have always been in short supply due to chip shortages. It could roll downhill to consumer goods from TVs to washers (don’t laugh, large appliances have been already in short supply). We would bet that earnings season will see a whole bunch of diverse companies missing numbers due to components shortages. Its just hard to predict who because everything has a chip in it.

Being a Big BFF with a long history helps

In this type of situation it pays to be a long time, big, close customer to the chip makers, like Apple. They are so tight with TSMC there is no light between them. You can rest assured that Apple will get all the chips it needs, both expensive and cheap from TSMC and they will always be first in line. Apple is TSMC’s number one customer so it will be no other way.

On the other end of the spectrum you likely have auto makers who are notoriously tough with their suppliers buying 25 cent chips at low margins. What are the odds of their orders being sped up? Zero.

Auto makers only have themselves to blame as they cut orders early in Covid and shouldn’t be shocked when they had to get back in line, at the end of the line, to re-order. Its called supply chain management.

Tom Caufield, CEO of GlobalFoundries, had said that his phone is ringing off the hook from auto manufacturers asking for wafers and he is “everybody’s new best friend”.

Broadcom’s CEO, Hock Tan, said on their call last night that Broadcom is pretty much booked up for the year and he doesn’t know when the shortage will subside. Broadcom is a big customer of TSMC and it doesn’t sound like they are getting extra wafer capacity.

Panic buying, hoarding & double ordering. The toilet paper shelves are empty.

Perhaps the biggest physical evidence of the panic Covid caused was the shortages of toilet paper in supermarkets in the early part of Covid.

Consumers probably thought they were going to be locked in their homes for months or paper factories would be shut down for months because it seemed like a years worth of TP was sold in days.

As we have seen in the past we think we are also seeing evidence of panic buying of chips, double ordering and stocking up.

We think there has already been hoarding by Chinese customers for well over a year who were concerned, rightfully so, about being cut off. Now add to that, hoarding by more customers currently experiencing supply problems. If I were in the auto industry supply chain I would be double and triple ordering and stocking up lest I lose my job.

Coming down off the “sugar high” may be problematic- Is this the high point in the cycle?

Right now chip makers are everyone’s best friends and popular on speed dial but the hangover from the current party could create a headache. As we know from a very long history, the chip industry is cyclical and goes though those cycles which are based on supply and demand and therefore pricing. Right now supply is short and demand is high…maybe artificially high due to hoarding and double ordering…and maybe supply is tight in the short term due to the Texas power problem and other issues….seems a bit like a “perfect storm”

A year or two from now chip makers could be a swipe left and ghosted by those currently in desperate need of a chip fix. Poetic justice would be for chip equipment to suffer shortages. Not likely.

It would be very funny cosmic Karma if chip equipment companies were impacted by the current chip shortages. After all, semiconductor equipment does happen to have a lot of semiconductors in it and the supply chain goes directly through China. The equipment controllers are basically souped up PC’s and dep and etch tools have a myriad of sub-system suppliers; robots, RF, Gas boxes etc; An EUV lithography tool is such a Rube Goldberg it likely has hundreds of chips.

We don’t expect a problem from chip equipment makers, but it could happen. In general, most everybody in the chip industry understands and is on guard for supply issues….obviously unlike the auto industry.

Channel Checks say its not just chips

From what we can tell the shortage issues seem to go beyond chips. Other components and discrete semiconductors are also short in some cases. However, this is likely due to panic buying and ordering from nervous customers and not systemic supply issues as in the mainstream chip industry.

Is the Panic worse than the Problem?

Much as with toilet paper the problem is likely less severe than the issues caused by the surrounding panic. The semiconductor industry making the news is far from normal. If I made a $50 consumer good with chips in it, I might get freaked out when I hear Ford has to shut down factories cause they can’t get chips.

The only good thing that has come out of this is that this long term issue has finally risen to the level where it has hit the White House and they are talking about the industry and doing something about it (which we have never seen before…)

Could the chip shortage hit economic growth and Covid recovery?

The dislocation in the chip industry does not come at a good time as we are looking at climbing out of the hole that Covid has put us in. Having car factories shut down and revenue and earnings hits at some companies certainly will not help the recovery.

It just creates more friction and resistance to the recovery. We think we could very easily see two to three quarters of direct impact on companies with some residual impact even further out. What remains to be seen is whether the lessons learned will actually be adopted or forgotten once it leaves our immediate memory, a year down the road.

The stocks
Chip companies in general are obviously doing very well due to near term demand. Equipment companies are also doing very well as capital spending is high and will remain high while chip companies business is so good.

After a yearlong or more strong run, it has been feeling like the semiconductor stocks want to roll over. We have had some days of stumbles. Valuation multiples are at all time highs. Some suggest a “re-pricing” but we had a similar re-pricing at the last cyclical peak only to pull back.

2021 is shaping up to be a very good year as momentum seems strong for business with little probability of a downturn. But the stocks don’t always follow earnings step for step and the semi stocks have always turned before business turned.

The chip shortage will eventually end and the real question is what happens after?

Also Read:

Semiconductor Shortage – No Quick Fix – Years of neglect & financial hills to climb

“For Want of a Chip, the Auto Industry was Lost”

Will EUV take a Breather in 2021?


Semiconductor Shortage – No Quick Fix – Years of neglect & financial hills to climb

Semiconductor Shortage – No Quick Fix – Years of neglect & financial hills to climb
by Robert Maire on 03-03-2021 at 8:00 am

Toamagachi Semiconductor shortage

– Semi Situation Stems from long term systemic neglect
– Will require much more than money & time than thought
– Fundamental change is needed to offset the financial bias
– Auto industry is just the hint of a much larger problem

Like recognizing global warming when the water is up to your neck

The problem with the semiconductor industry has finally been recognized but only after it stopped the production of the beloved F150 Pick Up truck and Elon’s Tesla. Many analysts and news organizations wrongly blame the Covid pandemic and its many consequences and assume this is just another example of the Covid fallout. Wrong! This has been a problem decades in the making. Its not new. The fundamental reasons have been in the works for years. The only thing the pandemic did was to bring the issue to the surface more quickly.

The issue could have been brought to the surface just as easily and with worse consequences by a conflict between China and Taiwan. Or perhaps another trade spat between Japan and Korea.

The semiconductor industry is perhaps not as robust as would otherwise be thought given that it hasn’t seen a significant problem before.

The reality is that the “internationalization” of both the industry and its supply chain have opened it up to all manner of disruption coming at any point along that long chain.

The consolidation has further concentrated the points of failure into a small handful of players and perhaps one, TSMC, that is 50+% of the non memory chip market.

Tamagotchi Toys were the Canary in a Coal Mine

Most people may not remember those digital pets called Tamagotchi that were a smash hit in the late 90’s. Many in the semiconductor industry in Taiwan do remember them. In the summer of 1997 they sucked up a huge amount of semiconductor capacity in Taiwan and whacked out the entire chip industry for the entire summer causing delays and shortages of all types of chips.

Tamagotchi Tidal Wave Hits Taiwan

In essence, a craze over a kids toy created shortages of critical semiconductor chips. Semiconductor capacity is much greater now than it was 20 years ago but the industry remain vulnerable to demand spikes and slowdowns.

The memory industry is an example of the problem

Perhaps the best example of the chip industry’s vulnerability is the memory semiconductor market. The market lives on the razors edge of supply and demand and the balance maintained between the two.

Too much demand and not enough supply and prices skyrocket….too little demand and excess supply and prices collapse.

The memory industry is clearly the most cyclical and volatile in the semiconductor universe. One fab going off line for even a short while due to a power outage or similar causes the stop market for memory chips to jump.

Kim Jong-Un should buy memory chips futures

All it would take is one “accidentally” fired artillery round from North Korea that hit a Samsung fab in South Korea and took it out of commission. Memory prices would go through the roof for a very long time as the rest of the industry could never hope to make up for the shortage caused in any reasonable amount of time

Other industries, such as oil, do not have the same problem

When you look at other industries in which a product is a commodity like memory is you do not have the same production problem. The oil industry which is also a razor’s balance between supply and demand does not have the same issue as there is a huge amount of excess capacity ready to come on line at a moments notice.

The cost of oil pumps and derricks sitting around idle waiting to be turned on is very very low as compared to the commodity they pump. This means the oil industry can flex up and down as needed by demand and easily make up for the shortage if someone goes off line (like Iran).

Imagine if the oil industry kept pumping, at full output, never slowing, for each new oil field drilled.

In the semiconductor industry the capital cost is essentially the whole cost so fabs never ever go offline as the incremental cost to produce more chips is quite low. This means there is no excess capacity in the chip industry of any consequence and they run 24X7. Capacity is booked out months in advance and capacity planning is a science (perfected by TSMC).

The semiconductor industry has all the maneuverability of a super tanker that takes many miles to slow down or speed up….you just can’t change capacity that easily.

There is no real fix to the capacity issue due to financials

To build capacity that could be brought on line in a crisis or time of high demand would require an “un-natural” act. That is spending billions to build a fab, only to have it sit there unused waiting for the capacity to be needed. This scenario is not going to happen….even the government isn’t dumb enough to spend billions on a “standby” factory that needs a constant spend to keep up Moore’s law.

Its just not going to happen

Moving fabs “on shore” just reduces supply risk not demand risk

Rebuilding fabs in the US would be a good thing as it would mean fabs that are no longer an artillery shell away from a crazy northern neighbor or an hour boat ride away from a much bigger threat that still claims to own you.

That will certainly help reduce the supply side risk assuming we don’t build the new fabs on fault lines or flood zones. The demand side variability will still exist but could be managed better.

Restarting “Buggy Whip” manufacturing

The other key thing that most people do not realize is that most semiconductors used in cars, toys and even defense applications are made in very old fabs. All those older fabs that used to make 386 and 486 chips and 1 megabit memory parts have long ago been sold for scrap by the pound and shipped off to Asia (China) and are now making automotive and toaster oven chips.

Old fabs never die…they just keep making progressively lower value parts. As I have previously mentioned in a prior note, you don’t make a 25 cent microcontroller for a car in a $7B , 5NM fab….the math simply doesn’t work.
This ability to keep squeezing value out of older fabs has worked as demand for trailing edge has not exceeded capacity.

For a typical chip company, the leading edge fab makes the highest value CPU, the next generation older fab maybe makes a GPU, the next older fab maybe some I/O chips or comms chips, the older fab makes consumer chips and the oldest fabs make chips for TV remotes.

In bleeding edge fabs the equipment costs are the vast majority with labor being a rounding error. In older fabs , with fully depreciated equipment, labor starts to become a factor so many older fabs are better suited to be packed up and shipped off to a low labor cost country.

The biggest problem is that demand for older chip technology seems to have exceeded the amount of older capacity in the world as chips are now in everything and IOT doesn’t need bleeding edge.

Equipment makers for the most part don’t make 6 inch (150MM) tools anymore, some still make their old 8 inch (200MM) some don’t. As we have previously mentioned, demand for 200MM now exceeds what it was in their peak.

Old Tools are being Hoarded

Summary
Fixing not only the shortage issue but the risk issue will take not only a lot of time but a lot of money. The problem is systemic and has been dictated by financial math that has incentivized what we currently have in place.

In order to change the behavior of anyone who runs a chip company and can add we need to put in place financial incentives, legal decrees, legislative incentives and use a multiple of levers to change the current dynamics of the industry.

Even with all the written motivation in place it will still take years for the physical implementation of the incentivized changes.

TSMC has been under enormous pressure for years about a fab in the US. Now they are planning one in Arizona that is still years away, will be old technology when it comes on line and will barely be a rounding error….. all that from a multi billion dollar effort….. but its a start.

A real effort is likely to be well north of $100B and 10 to 20 years in the making before we could get back to where the US was in the semiconductor industry 20 years ago.

The Stocks
As the saying goes, buying semiconductor equipment company stocks is like buying a basket of the semiconductor industry. They can also be view as the “arms merchants” in an escalating war.

It doesn’t matter who wins or loses in the chip industry but building more chip factories is obviously good for the equipment makers, in general.

In the near term, foreign makers such as Tokyo Electron, ASM International, Nova Measuring and others may make for an interesting play.

There is plenty of time as we are sure that no matter what happens we will see zero impact from government sponsored activities in 2021 and it will likely take a very long time to trickle down so we would beware of “knee jerk” reactions that may drive the stocks near term.

Also Read:

“For Want of a Chip, the Auto Industry was Lost”

Will EUV take a Breather in 2021?

New Intel CEO Commits to Remaining an IDM


Features of Resistive RAM Compute-in-Memory Macros

Features of Resistive RAM Compute-in-Memory Macros
by Tom Dillinger on 03-02-2021 at 8:00 am

V bitline

Resistive RAM (ReRAM) technology has emerged as an attractive alternative to embedded flash memory storage at advanced nodes.  Indeed, multiple foundries are offering ReRAM IP arrays at 40nm nodes, and below.

ReRAM has very attractive characteristics, with one significant limitation:

  • nonvolatile
  • long retention time
  • extremely dense (e.g., 2x-4x density of SRAM)
  • good write cycle performance (relative to eFlash)
  • good read performance

but with

  • limited endurance (limited number of ‘1’/’0’ write cycles)

These characteristics imply that ReRAM is well-suited for the emerging interest in compute-in-memory architectures, specifically for the multiply-accumulate (MAC) computations that dominate the energy dissipation in neural networks.

To implement a trained NN for inference applications, node weights in the network would be written to the ReRAM array, and the data inputs would be (spatially or temporally) decoded as the word lines accessing the array weight bits.  The multiplicative product of the data/wordline = ‘1’ and the stored weight_bit = ‘1’ would result in significant memory bitline current that could be readily sensed to denote the bit product output – see the figure below.

At the recent International Solid State Circuits Conference (ISSCC), researchers from Georgia Tech University and TSMC presented results from an experimental compute-in-memory design using TSMC’s 40nm ReRAM macro IP. [1]  Their design incorporates several unique features – this article summarizes some of the highlight of their presentation.

Background

As the name implies, ReRAM technology is based on the transitions of a thin film material between a high-resistance and low-resistance state.  Although there are a large number of different types of materials (and programming sequences) used, a typical metal-oxide thin-film implementation is depicted in the figure below.

The metal oxide thin film material shown incorporates the source and transport of oxygen ions/vacancies under an applied electric field of high magnitude.  (The researchers didn’t elaborate on the process technology in detail, but previous TSMC research publications on ReRAM development did utilize a TiO-based thin film programming layer.  Multiple metal-oxide thin film materials are also used.)

As depicted in the figure above, an initial “filament forming” cycle is applied, resulting in transport of oxygen ions in the thin film.  In the Reset state (‘0’), a high electrical resistance through the metal-oxide film is present.  During the application of a Set (‘1’) write cycle, oxygen ion migration occurs, resulting in an extension of the filament throughout the thin film layer, and a corresponding low electrical resistance.  In the (bipolar operation) technology example depicted above, the write_0 reset cycle breaks this filament, returning the ReRAM cell to its high resistance state.

The applied electric field across the top thin film for the (set/reset) write operation is of necessity quite large;  the applied “read” voltage to sense the (low or high) bitcell resistance utilizes a much smaller electric field.

There are several items of note about ReRAM technology:

  • the bitcell current is not a strong function of the cell area

The filamentary nature of the conducting path implies that the cell current is not strongly dependent on the cell area, offering opportunities for continued process node scaling.

  • endurance limits

There is effectively a “wearout” mechanism in the thin film for the transition between states – ReRAM array specifications include an endurance limits on the number of write cycles (e.g., 10**4 – 10**6).  Commonly, there is no limit on the number of read cycles.

The endurance constraints preclude the use of ReRAM as a general-purpose embedded “SRAM-like” storage array, but it is the evolutionary approach adopted as an eFlash replacement, and a compute-in-memory offering where pre-calculated weights are written, and updated very infrequently.

  • resistance ratio, programming with multiple write cycles

The goal of ReRAM technology is to provide a very high ratio of the high resistance to low resistance states (HRS/LRS).  When the cell is being accessed during a read cycle – i.e., data/wordline = ‘1’ – the bitline sensing circuit is simplified if i_HRS << i_LRS.

Additionally, it is common to implement a write to the bitcell using multiple iterations of a write-read sequence, to ensure the resulting HRS or LRS cell resistance is within the read operation tolerances.  (Multiple write cycles are also initially used during the forming step.)

  • HRS drift, strongly temperature dependent

The high-resistance state is the result of the absence of a conducting filament in the top thin film, after the oxygen ion transport during a write ‘0’ operation.  Note in the figure above the depiction of a high oxygen vacancy concentration in the bottom metal oxide film.  Any time a significant material concentration gradient is present, diffusivity of this material may occur, accelerated at higher temperatures.  As a result, the HRS resistance will drift lower over extended operation (at high temperature).

Georgia Tech/TSMC ReRAM Compute-in-Memory Features

The researchers developed a ReRAM-based macro IP for a neural network application, with the ReRAM array itself providing the MAC operations for a network node, and supporting circuitry providing the analog-to-digital conversion and the remaining shift-and-add logic functionality.  The overall implementation also incorporated three specific features to address ReRAM technology issues associated with:  HRS and LRS variation; low (HRS/LRS) ratio; and, HRS drift.

low HRS/LRS ratio

One method for measuring the sum of the data inputs to the node multiplied times a weight bit is to sense the resulting bitline current drawn by the cells whose data/wordline = ‘1’.  (Note that unlike a conventional SRAM block with a single active decoded address wordline, the ReRAM compute-in-memory approach will have an active wordline for each data input to the network node whose value is ‘1’.  This necessitates considerable additional focus on read-disturb noise on adjacent, unselected rows or the array.)  However, for a low HRS/LRS ratio, the bitline current contribution from cells where data = ‘1’ and weight = ‘0’ needs to be considered.  For example, if (HRS/LRS) = 8, the cumulative bitline current of eight (data = ‘1’ X weight = ‘0’) products will be equivalent to one LRS current (‘1’ X ‘1’), a binary multiplication error.

The researchers chose to use an alternative method.  Rather than sensing the bitline current (e.g., charging a capacitor for a known duration to develop a readout voltage), the researchers pumped a current into the active bitcells and measured Vbitline directly, as illustrated below.

The effective resistance is the parallel combination of the active LRS and HRS cells.  The unique feature is that the current source value is not constant, but is varied with the number of active wordlines – each active wordline also connects to an additional current source input.  Feedback from Vbitline to each current source branch is also used, as shown below.

This feedback loop increases the sensitivity of each current source branch to Reffective, thus amplifying the resistance contribution of each (parallel) LRS cell on the bitline, and reducing the contribution of each (parallel) HRS cell.  The figure below illustrates how the feedback loop fanout to each current branch improves the linearity of the Vbitline response, with an increasing number of LHS cells accessed (and thus, parallel LRS resistances contributing to Rtotal).

LRS/HRS variation

As alluded to earlier, multiple iterations of write-read are often used, to confirm the written value into the ReRAM cell.

The technique employed here to ensure a tight tolerance on the written HRS and LRS value evaluates the digital value read after the write, and increases/decreases the pulse width of the subsequent (reset/set) write cycle iteration until the (resistance) target is reached, ending the write cycle.

HRS drift

The drift in HRS resistance after many read cycles is illustrated below (measured at high operating conditions to accelerate the mechanism).

To compensate for the drift, each bitcell is periodically read – any HRS cell value which has changed beyond a pre-defined limit will receive a new reset write cycle to restore its HRS value.  (The researchers did not discuss whether this “mini-reset” HRS write cycle has an impact on the overall ReRAM endurance.)

Testsite Measurement Data

A micrograph of the ReRAM compute-in-memory testsite (with specs) is shown below.

Summary

ReRAM technology offers a unique opportunity for computing-in-memory architectures, with the array providing the node (data * weight) MAC calculation.  The researchers at Georgia Tech and TSMC developed a ReRAM testsite with additional features to address some of the technology issues:

  • HRS/LRS variation:  multiple write-read cycles with HRS/LRS sensing are used
  • low HRS/LRS ratio:  a Vbitline voltage-sense approach is used, with a variable bitline current source (with high gain feedback)
  • HRS drift:  bitcell resistance is read periodically, and a reset write sequence applied if the read HRS value drops below a threshold

I would encourage you to review their ISSCC presentation.

-chipguy

References

[1]  Yoon, Jong-Hyeok, et al., “A 40nm 64kb 56.67TOPS/W Read-Disturb-Tolerant Compute-in-Memory/Digital RRAM Macro with Active-Feedback-Based Read and In-Situ Write Verification”, ISSCC 2021, paper 29.1.

 


TSMC ISSCC 2021 Keynote Discussion

TSMC ISSCC 2021 Keynote Discussion
by Daniel Nenni on 03-01-2021 at 6:00 am

Mark Liu TSMC ISSCC 2021

Now that semiconductor conferences are virtual there are better speakers since they can prerecord and we have the extra time to do a better job of coverage. Even when conferences go live again I think they will also be virtual (hybrid) so our in depth coverage will continue.

ISSCC is one of the conferences we covered live since it’s in San Francisco so that has not changed. We will however be able to cover many more sessions as they come to our homes on our own time.

First off is the keynote by TSMC Chairman Mark Liu:  Unleashing the Future of Innovation:

Given the pandemic related semiconductor boom that TSMC is experiencing, Mark might not have had time to do a live keynote so this was a great opportunity to hear his recorded thoughts on the semiconductor industry, the foundry business model, and advanced semiconductor technologies. Here are some highlights from his presentation/paper intermixed with my expert insights:

  • The semiconductor industry has been improving transistor energy efficiency by about 20-30% for each new technology generation and this trend will continue.
  • The global semiconductor market is estimated at $450B for 2020.
  • Products using these semiconductors represent 3.5% of GPD ($2T USD).
  • From 2000 to 2020 the overall semiconductor industry grew at a steady 4%.
  • The fabless sector grew at 8% and foundry grew 9% compared to IDM at 2%.
  • In 2000 fabless revenue accounted for 17% of total semiconductor revenue (excluding memory).
  • In 2020 fabless revenue accounted for 35% of total semiconductor revenue (excluding memory).
  • Unlike IDMs, innovators are only limited by their ideas not capital.

Nothing like a subtle message to the new Intel CEO. It will be interesting to see if the Intel – TSMC banter continues. I certainly hope so. The last one that started with Intel saying that the fabless model was dead did not end so well.

Mark finished his IDM message with:

“Over the previous five decades, the most advanced technology had been available first to captive integrated device manufacturers (IDMs). Others had to make do with technologies that were one or several generations behind. The 7nm logic technology (mass production in 2017) was a watershed moment in semiconductor history. In 2017, 7nm logic, was the first time that the world’s most advanced technology was developed and delivered by a pure-play foundries first, and made available broadly to all fabless innovators alike. This trend will likely continue for future technology generations…”

As we all now know Intel will be expanding TSMC outsourcing at 3nm. TSMC 3nm will start production in Q4 of this year for high volume manufacturing beginning in 2H 2022. The $10B question is: Will Intel get the Apple treatment from TSMC (early access, preferred pricing, and custom process recipes)?

I’m not sure everyone understands the possible ramifications of Intel outsourcing CPU/GPU designs to TSMC so let’s review:

  • Intel and AMD will be on the same process so architecture and design will be the focus. More direct comparisons can be made.
  • Intel will have higher volumes than AMD so pricing might be an issue. TSMC wafers cost about 20% less than Intel if you want to do the margins math.
  • Intel will have designs on both Intel 7nm and TSMC 3nm so direct PDK/process comparisons can be made.

Bottom line: 2023 will be a watershed moment for Intel manufacturing, absolutely!


How SerDes Became Key IP for Semiconductor Systems

How SerDes Became Key IP for Semiconductor Systems
by Eric Esteve on 02-14-2021 at 10:00 am

Ethernet bandwidth

We have seen that the interface IP category is seeing incredibly high growth rate over the last two decades and we expect this category to generate an ongoing high source of IP revenues for at least another decade. But if we dig into the various successful protocols like PCI Express, Ethernet or USB, we can detect a common function in the Physical (PHY) part, the Serializer/Deserializer (SerDes) function.

In 1998, advanced interconnects used in telecom application were based on 622 MHz LVDS I/O. Telecom chip makers were building huge chips integrating 256 LVDS I/O running at 622 MHz to support Networking fabric. Today, advanced PAM4 SerDes run at 112 Gbps; over a single connection to support 100G Ethernet. In twenty years, SerDes technology efficiency jumped by a factor of 180-times! If we make a quick comparison with CPU technologies. In 1998 Intel released the Pentium II Dixon processor, whose frequency was 300 MHz. In 2018, an Intel Core i3 run at 4 GHz. CPU frequencies have grown by a factor of 15 times over a span of twenty years while SerDes speeds have exploded by by a factor of 180-times.

SerDes are now used in many more application than just telecom, to interface chips and systems. At the end of the 2000’s, smartphones integrated USB3, SATA and HDMI interfaces, while Telecom and PC/server integrated both PCIe and Ethernet. These trends resulted in the interface IP market to become a sizeable IP category growing above $200 million at that time. It was small compared to the CPU category, which was four or five times larger. But, since 2010, the interface category has seen at least 15%, year over year. It was the fastest growing category compared with all other semiconductor IP categories, such as CPU, GPU, DSP, Library, etc. The reason is directly linked with the number of connected devices growing every year, each exchanging more data (more movie, pictures, etc.). Connectivity is the beginning of the chain of communication, to the internet modem or base station, Ethernet switch and the datacenter network.

Figure 1: Long Term Ethernet Switch Forecast (source: Dell’Oro)

During the 2010 decade the worldwide community became almost completely connected. Ethernet became the backbone of this connectivity as both the connectivity rates and the number of datacenter quickly increased over the decade. If we use SerDes rates as an indicator, it was 10 Gbps in 2010, 28 Gbps in 2013 and 56 Gbps in 2016 (allowing to support 10G, 25G and 50G Ethernet resp.) and 112 Gbps in 2019.

Then, in 2017, the exploding high-speed connectivity needs for emerging data-intensive compute applications such as machine learning and neural networks started to appear, adding to the growing need of high bandwidth connectivity. At the same time, analog mixed-signal architectures, which were the norm for SerDes design since the inception, became extremely difficult to manage and much more sensitive to process, voltage, and temperature variations, due to the evolution of CMOS technology toward advanced FinFET. In modern nanometer FinFET technologies, building transistors involves stacking individual electrons, given the tiny dimensions of the transistors. Thus, the construction of precise, analog circuits that can sustain stressful environmental variations is extremely difficult.

But the positive point with advanced technology like 7nm is that you can integrate an incredible number of transistors by sq. mm (density of 100 million Transistor per sq. mm), so it’s now possible to develop new digital-based architecture leveraging Digital Signal Processing (DSP) to do the vast majority of the Physical Layer work. DSP-based architecture enables the use of higher order Pulse Amplitude Modulation (PAM) modulation scheme compared to Non-Return to Zero (NRZ or PAM2) used by historical previous analog mixed-signal approaches. PAM 4 enabled doubling data throughput of channels without having to increase the bandwidth of channels themselves. As an example, a channel with 28 GHz of bandwidth can support a maximum data throughput of 56 Gbps using NRZ signaling. With the use of PAM-4 DSP technique this same 28 GHz bandwidth channel can now support a data rate of 112 Gbps! When you consider that analog SerDes architectures are limited to a maximum of 56 Gbps rates for physical reasons (and maybe less…), DSP SerDes  are the approach to scale rates to 200 Gbps and beyond, with the use of more sophisticated modulation schemes (eg. PAM-6 or PAM-8).

Using DSP based SerDes is not only required for building robust interfaces in FinFET technologies, but it is also the only way to double data rates for above 56 Gbps, eg. 112 Gbps with PAM-4, 200 Gbps with PAM-8. And this need for more bandwidth is the requirement linked with emerging data-intensive applications like AI (to interconnect CPU and accelerator), ADAS, and to follow the data-centric trend of the connected human community, expected to grow steadily over the next decade.

Figure 2: Top 5 Interface IP Forecast & CAGR (source: IPnest 2020)

In the “Interface IP Survey” IPnest rank the market share of IP vendor revenues by protocol since 2009. In the 2020 version of the report, we have shown that the Interface IP category will have 15% CAGR from 2020-2024 to reach $1.57 billion, as listed in Figure 2. This is a wide IP market including PCIe, Ethernet and SerDes as well as USB, MIPI, HDMI, SATA and memory controller IP. In 2019, Synopsys is a strong leader with 53% market share of the estimated $870 million IP market, followed by Cadence with 12%. Both EDA companies have defined a one-stop-shop business model, addressing the mainstream market. This strategy is successful for these large companies as it targets a wide part of various segments (smartphone, consumer, automotive or datacenter), but not the most demanding high-end portion of these segments.

Nevertheless, another strategy can be successful for the IP market, which is to strongly focus on one segment (eg. High-end) of the market and provide the best experience to very demanding hyperscalar customers. If you can build an excellent engineering team able to develop top quality products on the most advanced technologies, focusing on the high end of the market, the resulting business model can be rewarding.

We have seen that SerDes IP is the key to the interface IP market. Furthermore if we concentrate on PCIe and Ethernet protocols, Figure 3 illustrates the IP revenues forecast 2020-2025, limited to high-end PCIe (Gen 5 and Gen 6) and high-end Ethernet (PHY based on 56G, 112G and 224G SerDes), including D2D protocol for a reason that will be describes shortly.

 

Figure 3: High-End Interface IP Forecast & CAGR (source: IPnest 2021)

This high-end interface IP forecast show 28% CAGR from 2020-2025 (to be compared with 15% for the total interface IP market), and a TAM of $806 million in 2025. One young company has demonstrated strong leadership on this High-End interface IP segment, thanks to their focus on high-end SerDes (112G since 2017 and soon 200G) targeting the most advanced technology nodes (7nm in 2017, then 5nm in 2019) offered by the two leading foundries, TSMC and Samsung. Alphawave, was founded in 2017 has are rumored to have $75 million in orders in 2020, thanks to their positioning targeting the most advanced rates and application of the high-end segment of PCIe and Ethernet. In this portion of the market, they enjoyed 28% market share in 2019 and 36% in 2020. If Alphawave can keep their advance on the high-end SerDes market, it’s not unrealistic to foresee $300-400 million in IP revenues… by 2024-2025!

Since 2019, a new sub-segment, D2D interface, has emerged and is expected to grow with 46% CAGR from 2020-2024. By definition, D2D protocol are used between two chips or die, within a common silicon package. Briefly, we consider two cases for D2D: i) dis-integration of the master SoC to avoid SoC area to badly impact yield or become larger that the maximum reticle size, or ii) SoC interconnect with “service” chiplet (can be I/O chip, FPGA, accelerator…).

At this point (February 2021), there are several protocols being used, with the industry trying to build formalized standards for many of them. Current leading D2D standards includes i) Advanced Interface Bus (AIB, AIB2) initially defined by Intel who has offered royalty free usage, ii) High Bandwidth Memory (HBM) where DRAM dies are stacked on each other on top of a silicon interposer and are connected using TSVs, iii) Domain-Specific Architecture (ODSA) subgroup, an industry group, has defined two other interfaces, Bunch of Wires (BoW) and OpenHBI. All of these D2D standards are based on DDR-like protocol, a parallel group of single-ended data wires being accompanied with a forwarded clock currently operating in the the 2GHz to 4 GHz range. By using literally hundred of parallel wires over very short distances, these interfaces compete with VHS SerDes NRZ, usually defined around 40 Gbps, and offering a strong advantage to enable a much lower latency and lower power consumption, compared to SerDes.

There is now consensus in the industry that a maniacal focus on achieving Moore’s law is becoming not valid anymore for advanced technology node, eg. 7nm and below. Chip integration is still happening, with more transistor being added per sq. mm at every new technology node. However, the cost per transistor is growing higher every new node. Chiplet technology is a key initiative to drive increased integration for the main SoC while using older mainstream nodes for service chiplet. This hybrid strategy decreases both the cost and the design risk associated with integration of the service IP directly into the main SoC. IPnest believes this trend will have two main effect in the interface IP business, one will be the strong growth of D2D IP revenues soon (2021-2025), and the other is the creation of the heterogenous chiplet market to augment the high end SerDes IP market.

We have forecasted the growth of the D2D Interface IP category for 2020-2025, passing from less than $10 million in 2020 to $171 million in 2025 (87% CAGR). This forecast is based on the assumption that the service chiplet market should explode in 2023, when most of advanced SoC will be designed in 3nm. This will make integration of high-end IP like SerDes far too risky, leading to externalizing this functionality into a chiplet designed in more mature node like 7 or 5nm. If Interface IP vendors will be major actors in this revolution, the Silicon foundries addressing the most advanced nodes like TSMC and Samsung and manufacturing the main SoC will have key role. We don’t think they will design chiplet, but they could make the decision to support IP vendors and push them to design chiplet to be used with SoC in 3nm, like they do today when supporting advanced IP vendors to market their high-end SerDes as hard IP in 7nm and 5nm. Intel’s recent transition to 3rd party foundries is expected to also leverage third party IPs, as well as heterogenous chiplet adoption by the semiconductor heavyweight. In this case, no doubt that Hyperscalars like Microsoft, Amazon and Google will also adopt chiplet architecture… if they don’t even precede Intel in chiplet adoption.

By Eric Esteve (PhD.) Analyst, Owner IPnest

Also Read:

Interface IP Category to Overtake CPU IP by 2025?

Design IP Revenue Grew 5.2% in 2019, Good News in Declining Semi Market

#56thDAC SerDes, Analog and RISC-V sessions


Will EUV take a Breather in 2021?

Will EUV take a Breather in 2021?
by Robert Maire on 02-07-2021 at 6:00 am

KLA EUV Slowdown

-KLAC- Solid QTR & Guide but flat 2021 outlook
-Display down & more memory mix
-KLAC has very solid Dec Qtr & guide but 2021 looks flattish
-Mix shift to memory doesn’t help- Display weakness
-Despite flat still looking at double digit growth
-EUV driven business may see some slowing from digestion

As always, KLAC came in at the high end of the guided range with revenues of $1.65B and non GAAP EPS of $3.24 versus the guided range of $2.82 to $3.46. Guidance is for $1.7B +-75M and non GAAP EPS range of $3.23 – $3.91. By all financial and performance metrics a very solid quarter

A “flattish” 2021 while WFE grows “mid teens”

Management suggested that WFE which exited 2020 at $59-$60B would grow double digits in 2021 but the year would look a bit more flat for KLAC as its acquired display group is expected to shrink and there is an expected mix shift towards memory which is less process control intensive.

Foundry has been strong which has been very good for KLA and the current quarter is expected to see roughly 68% of business from foundry

Will EUV take a breather?

KLA obviously sells process management tools to companies working on new process such as EUV. TSMC has bought so many EUV tools it probably has problems finding the space for more. TSMC has also clearly gone well over the hump of getting EUV to work and likely may not need as much process control and maybe could slow its EUV scanner purchases a bit given that its so far ahead.

Intel is obviously still coming up the learning curve and purchasing curve and Samsung is in between the two. We would not expect either Samsung nor Intel to be as EUV intensive as TSMC has been, at least not in the near term. All this being said , it is not unreasonable to expect EUV related process management to slow slightly.

Memory not as intensive as Foundry/logic

The industry is expecting memory makers to increase capex spend in 2021 as supply and demand have been in reasonable balance and supply is expected to get tighter.

Most of the expectation is on the DRAM side which is slightly less process control intensive as compared to NAND and likely lower in overall spend. This mix shift towards memory is obviously better for memory poster child Lam than for foundry poster child KLA. However its not like foundry is falling off a cliff with TSMC spending a record of between $26B and $28B in capex.

Service adding nice recurring revenue

As we have seen with KLA’s competitors, the service business continues its rise in importance to the company. The recurring revenue stream counterbalances the new equipment cyclicality and lumpiness. Having 25% or more of your revenue coming from service is very attractive

Wafer inspection positive while reticle inspection negative

EUV “print check” has obviously been very good for KLA and a way to play the EUV transition given the issues in reticle inspection. Patterning (AKA reticle inspection) was down significantly after a nice bump in prior quarters where KLA managed to take back some business from Lasertec (which now sports a $10B Mkt Cap).

Obviously “missing the boat” on EUV reticle inspection is toothpaste that can’t be put back in the tube. We expect Lasertec to get the lions share of Intel’s business as it ramps up EUV.

The stock

If we assume roughly $7B in revenues for 2021 ($1.75B/Q) with roughly $15 in EPS ($3.75/Q) we arrive at roughly 19X forward EPS, at the current stock price. This is likely a pretty good valuation for a company with stellar/flawless execution in a slowing, but still strong, market.

Investors will likely get turned off by the “flattish” commentary despite the good numbers. It also doesn’t help that the chip stocks have been feeling a bit like they are turning over here Despite any weakness KLA remains the top financial performer in the industry.

Also Read:

New Intel CEO Commits to Remaining an IDM

ASML – Strong DUV Throwback While EUV Slows- Logic Dominates Memory

2020 was a Mess for Intel


A Research Update on Carbon Nanotube Fabrication

A Research Update on Carbon Nanotube Fabrication
by Tom Dillinger on 12-22-2020 at 10:00 am

IV measurement testchip

It is quite amazing that silicon-based devices have been the foundation of our industry for over 60 years, as it was clear that the initial germanium-based devices would be difficult to integrate at a larger scale.  (GaAs devices have also developed a unique microelectronics market segment.)  More recently, it is also rather amazing that silicon field-effect devices have found a new life, through the introduction of topologies such as FinFETs, and soon, as nanosheets.  Research is ongoing to bring silicon-based complementary FET (CFET) designs to production status, where nMOS and pMOS devices are fabricated vertically, eliminating the lateral n-to-p spacing in current cell designs.  Additionally, materials engineering advances have incorporated (tensile and compressive) stress into the silicon channel crystal structure, to enhance free carrier mobility.

However, the point of diminishing returns for silicon engineering is approaching:

  • silicon free carrier mobility is near maximum, due to velocity saturation at high electric fields
  • the “density of free carrier states” (DoS) at the conduction and valence band edges of the silicon semiconductor is reduced with continued dimensional scaling – more energy is required to populate a broader range of carrier states
  • statistical process variation associated with fin patterning is considerable
  • heat conduction from the fin results in increased local “self-heat” temperature, impacting several reliability mechanisms (HCI, electromigration)

A great deal of research is underway to evaluate the potential for a fundamentally different field-effect transistor material than silicon, yet which would also be consistent with current high volume manufacturing operations.  One option is to explore monolayer, two-dimensional semiconducting materials for the device channel, such as molybdenum disulfide (MoS2).

Another promising option is to construct the device channel from carbon nanotubes (CNT).  The figure below provides a simple pictorial of the unique nature of carbon bonding.  (I’m a little rusty on my chemistry, but I recall “sp2” bonding refers to the pairing of electrons from adjacent carbon atoms from a sub-orbital “p shell” around the nucleus. There are no “dangling bonds”, and the carbon material is inert.)

Note that graphite, graphene, and CNT structures are similar chemically – experimental materials analysis with graphite is easier, and can ultimately be extended to CNT processing.

At the recent IEDM conference, TSMC provided an intriguing update on their progress with CNT device fabrication. [1]  This article summarizes the highlights of that presentation.

CNT devices offer some compelling features:

  • very high carrier mobility (> 3,000 cm**2/V-sec, “ballistic transport”, with minimal scattering)
  • very thin CNT body dimensions (e.g., diameter ~1nm)
  • low parasitic capacitance
  • excellent thermal conduction
  • low temperature (<400C) processing

The last feature is particularly interesting, as it also opens up the potential for integration of silicon-based, high-temperature fabrication with subsequent CNT processing.

Gate Dielectric

A unique process flow was developed to provide the “high K” dielectric equivalent gate oxide for a CNT device, similar to the HKMG processing of current silicon FETs.

The TEM figure above illustrates the CNT cross-section.  Deposition of an initial interface dielectric (Al2O3) is required for compatibility with the unique carbon surface – i.e., suitable nucleation and conformity of this thin layer on carbon are required.

Subsequently, atomic level deposition (ALD) of a high-K HfO2 film is added. (These dielectric experiments on material properties were done with a graphite substrate, as mentioned earlier.)

The minimum thicknesses of these gate dielectric layers is constrained by the requirement for very low gate leakage current – e.g., <1 pA/CNT, for a gate length of 10nm.  The test structure fabrication for measuring gate-to-CNT leakage current is illustrated below.  (For these electrical measurements, the CNT structure used a quartz substrate.)

The “optimal” dimensions from the experiments results in t_Al2O3 = 0.35nm and t_HfO2 = 2.5nm.  With these extremely thin layers, Cgate_ox is very high, resulting in improved electrostatic control.  (Note that these layers are thicker than the CNT diameter, the impact of which will be discussed shortly.)

Gate Orientation

The CNT devices evaluated by TSMC incorporated a unique “top gate plus back gate” topology.

The top gate provides the conventional semiconductor field-effect device input, while the (larger) back gate provides electrostatic control of the carriers in the S/D extension regions, to effectively reduce the parasitic resistances Rs and Rd.  Also, the back gate influences the source and drain contact potential between the CNT and Palladium metal, reducing the Schottky diode barrier and associated current behavior at this semiconductor-metal interface.

Device current

The I-V curves (both linear and log Ids for subthreshold slope measurement) for a CNT pFET are depicted below.  For this experiment, Lg = 100nm, 200nm S/D spacing, CNT diameter = 1nm, t_Al2O3 = 1.25nm, t_HfO2 = 2.5nm.

For this test vehicle (fabricated on a quartz substrate), a single CNT supports Ids in excess of 10uA.  Further improvements would be achieved with thinner dielectrics, approaching the target dimensions mentioned above.

Parallel CNTs in production fabrication will ultimately be used – the pertinent fabrication metric will be “the number of CNTs per micron”.  For example, a CNT pitch of 4nm would be quoted as “250 CNTs/um”.

Challenges

There are certainly challenges to address when planning for CNT production (to mention but a few):

  • regular/uniform CNT deposition, with exceptionally clean surface for dielectric nucleation
  • need to minimize the carrier “trap density” within the gate dielectric stack
  • optimum S/D contact potential materials engineering
  • device modeling for design

The last challenge above is especially noteworthy, as current compact device models for field-effect transistors will definitely not suffice.  The CNT gate oxide topology is drastically different than a planar or FinFET silicon channel.  As the gate-to-channel electric field is radial in nature, there is not a simple relation for the “effective gate oxide”, as with a planar device.

Further, the S/D extensions require unique Rs and Rd models.  Also, the CNT gate oxide is thicker than the CNT diameter, resulting in considerable fringing fields from the gate to the S/D extensions and to the (small pitch separated) parallel CNTs.  Developing suitable compact models for CNT-based designs is an ongoing effort.

Parenthetically, a CNT “surrounding gate” oxide – similar to the gate-all around nanosheet – would be an improvement over the deposited top gate oxide, but difficult to manufacture.

TSMC is clearly investing significant R&D resources, in preparation for the “inevitable” post-silicon device technology introduction.  The results on CNT fabrication and electrical characterization demonstrate considerable potential for this device alternative.

-chipguy

References

[1]  Pitner, G., et al, “Sub-0.5nm Interfacial Dielectric Enables Superior Electrostatics:  65mV/dec Top-Gated Carbon Nanotube FETs at 15nm Gate Length”, IEDM 2020.


Advanced Process Development is Much More than just Litho

Advanced Process Development is Much More than just Litho
by Tom Dillinger on 12-16-2020 at 10:00 am

Vt distribution

The vast majority of the attention given to the introduction of each new advanced process node focuses on lithographic updates.  The common metrics quoted are the transistors per mm**2 or the (high-density) SRAM bit cell area.  Alternatively, detailed decomposition analysis may be applied using transmission electron microscopy (TEM) on a lamella sample, to measure fin pitch, gate pitch, and (first-level) metal pitch.

With the recent transition of the critical dimension layers from 193i to extreme ultraviolet (EUV) exposure, the focus on litho is understandable.  Yet, process development and qualification encompasses many more facets of materials engineering to achieve robust manufacturability, so that the full complement of product goals can be achieved.  Specifically, process development engineers are faced with increasingly stringent reliability targets, while concurrently achieving performance and power dissipation improvements.

At the recent IEDM conference, TSMC gave a technical presentation highlighting the development focus that enabled the N5 process node to achieve (risk production) qualification.  This article summarizes the highlights of that presentation. [1]

An earlier SemiWiki article introduced the litho and power/performance features of N5. [2]  One of the significant materials differences in N5 is the introduction of a “high mobility” device channel, or HMC.  As described in [2], the improved carrier mobility in N5 is achieved by the introduction of additional strain on the device channel region.  (Although TSMC did not provide technical details, the pFET hole mobility is also likely improved by the introduction of a moderate percentage of Germanium into the Silicon channel region, or Si(1-x)Ge(x).)

Additionally, the optimized N5 process node incorporates an optimized high-K metal-gate (HKMG) dielectric stack between gate and channel, resulting in a stronger electric field.

A very significant facet of this “bandgap engineering” for carrier mobility and the gate oxide stack materials selection is to ensure that reliability targets are satisfied.  Several of the N5 reliability qualification results are illustrated below.

TSMC highlighted the following reliability measures from the N5 qualification test vehicle:

  • bias temperature stability (BTI)
  • both NBTI for pFETs and PBTI for nFETs, manifesting in a performance degradation over time from a device Vt shift (positive absolute value) due to trapped oxide charge
  • also may result in a degradation of VDDmin for SRAM operation
  • hot carrier injection (HCI)
  • an asymmetric injection of charge into the gate oxide near the drain end of the device (operating in saturation), resulting in degraded carrier mobility
  • time-dependent gate oxide dielectric breakdown (TDDB)

Note that the N5 node is targeted to satisfy both high-performance and mobile (low-power) product requirements.  As a result, both performance degradation and maintaining an aggressive SRAM VDDmin are important long-term reliability criteria.

TDDB

The figure above illustrates that the TDDB lifetime is maintained relative to node N7, even with the increased gate electric field.

Self-heating

The introduction of FinFET device geometries substantially altered the thermal resistance paths from the channel power dissipation to the ambient.  New “self-heating” analysis flows were employed to more accurately calculate local junction temperatures, often displayed as a “heat map”.  As might be expected with the aggressive dimensional scaling from N7 to N5, the self-heat temperature rise is greater in N5, as illustrated below.

Designers of HPC products need to collaborate with both their EDA partners for die thermal analysis tools and their product engineering team for accurate (on-die and system) thermal resistance modeling.  For the on-die model, both active and inactive structures strongly influence the thermal dispersion.

HCI

Hot carrier injection performance degradation for N7 and N5 are shown below, for nFETs and pFETs.

Note that HCI is strongly temperature-dependent, necessitating accurate self-heat analysis.

BTI

The pMOS NBTI reliability analysis results are illustrated below, with the related ring oscillator performance impact.

In both cases, reliability analysis demonstrates improved BTI characteristics of N5 relative to N7.

SRAM VDDmin

The SRAM minimum operating voltage (VDDmin) is a key parameter for low-power designs, especially with the increasing demand for local memory storage.  Two factors that contribute to the minimum SRAM operating voltage (with sufficient read and write margins) are:

  • the BTI device shift, as shown above
  • the statistical process variation in the device Vt, as shown below (normalized to Vt_mean in N7 and N5)

Based on these two individual results, the SRAM reliability data after HTOL stress shows improved VDDmin impact for N5 versus N7.

Interconnect

TSMC also briefly described the N5 process engineering emphasis on (Mx, low-level metal) interconnect reliability optimization.  With an improved damascene trench liner and a “Cu reflow” step, the scaling of the Mx pitch – by ~30% in N5 using EUV – did not adversely impact electromigration fails, nor line-to-line dielectric breakdown.  The figure below illustrates the line-to-line (and via) cumulative breakdown reliability fail data for N5 compared to N7 – N5 tolerates the higher electric field with the scaled Mx pitch.

Summary

The majority of the coverage associated with the introduction of TSMC’s N5 process node related to the broad adoption of EUV lithography to replace multipatterning for the most critical layers, enabling aggressive area scaling.  Yet, process engineers must also optimize materials selection and many individual fabrication steps, to achieve reliability targets.  TSMC recently presented how these reliability measures for N5 are superior to prior nodes.

-chipguy

References

[1]  Liu, J.C., et al, “A Reliability Enhanced 5nm CMOS Technology Featuring 5th Generation FinFET with Fully-Developed EUV and High Mobility Channel for Mobile SoC and High Performance Computing Application”, IEDM 2020.

[2]  https://semiwiki.com/semiconductor-manufacturers/tsmc/282339-tsmc-unveils-details-of-5nm-cmos-production-technology-platform-featuring-euv-and-high-mobility-channel-finfets-at-iedm2019/

 

Related Lithography Posts