Highlights of the TSMC Technology Symposium 2021 – Silicon Technology

Highlights of the TSMC Technology Symposium 2021 – Silicon Technology
by Tom Dillinger on 06-13-2021 at 6:00 am

logic technology roadmap

Recently, TSMC held their annual Technology Symposium, providing an update on the silicon process technology and packaging roadmap.  This article will review the highlights of the silicon process developments and future release plans.

Subsequent articles will describe the packaging offerings and delve into technology development and qualification specifically for the automotive sector.  Several years ago, TSMC defined four “platforms” which would receive unique R&D investments to optimize specific technical offerings:  high performance computing (HPC); mobile; edge/IoT computing (ultra-low power/leakage); and, automotive.  The focus on process development for the automotive market was a prevalent theme at the Symposium, and will be covered in a separate article.

Parenthetically, these platforms remain the foundation of TSMC’s roadmap.  Yet, the mobile segment has evolved beyond (4G) smartphones to encompass a broader set of applications.  The emergence of the “digital data transformation” has led to increased demand for wireless communication options between edge devices and cloud/data center resources – e.g., WiFi6/6E, 5G/6G (industrial and metropolitan) networks.  As a result, TSMC is emphasizing their investment in RF process technology development, to address this expanding segment.

General

Here are some general highlights from the Symposium, followed by specific process technology announcements.

  • breadth of offerings

In 2020, TSMC extended their support to encompass 281 distinct process technologies, shipping 11,617 products to 510 customers.  As in previous years, TSMC proudly stated “we have never shut down a fab.”

  • capacity

Current capacity in 2020 exceeds 12M (12” equivalent) wafers, with expansion investments for both advanced (digital) and specialty process nodes.

  • capital equipment investment

TSMC plans to invest a total of US$100 billion over the next three years, including a US$30 billion capital expenditure this year, to support global customer needs.

TSMC’s global 2020 revenue was $47.78B – the $30B annual commit to fab expansion certainly would suggest an expectation of significant and extended semiconductor market growth, especially for the 7nm and 5nm process families.  For example, new tapeouts (NTOs) for the 7nm family will be up 60% in 2021.

  • US fab

TSMC has begun construction of a US fab in Phoenix, AZ – volume production of the N5 process will commence in 2024 (~20K wafers per month).

  • environmental initiatives

Fabs are demanding consumers of electricity, water, and (reactive) chemicals.  TSMC is focused on transitioning to 100% renewable energy sources by 2050 (25% by 2030).  Additionally, TSMC is investing in “zero waste” recycling and purification systems, returning used chemicals to “electronic grade” quality.

One cautionary note…  Our industry is famously cyclic, with amplified economic upticks and downturns.  The clear message from TSMC at the Symposium is that the accelerating adoption of semiconductors across all platforms — from data-intensive computation centers to wireless/mobile communications to automotive systems to low-power devices – will continue for the foreseeable future.

Process Technology Roadmap

  • N7/N7+/N6/N5/N4/N3

The figure below summarizes the advanced technology roadmap.

N7+ represents the introduction of EUV lithography to the baseline N7 process.  N5 has been in volume production since 2020.

N3 will remain a FinFET-based technology offering, with volume production starting in 2H2022.  Compared to N5, N3 will provide:

  • +10-15% performance (iso-power)
  • -25-30% power (iso-performance)
  • +70% logic density
  • +20% SRAM density
  • +10% analog density

TSMC foundation IP has commonly offered two standard cell libraries (of different track heights) to address the unique performance and logic density of the HPC and mobile segments.  For N3, the need for “full coverage” of the performance/power (and supply voltage domain) range has led to the introduction of a third standard cell library, as depicted below.

Design enablement for N3 is progressing toward v1.0 PDK status next quarter, with a broad set of IP qualified by 2Q/3Q 2022.

N4 is a unique “push” to the existing N5 production process.  An optical shrink is directly available, compatible with existing N5 designs.  Additionally, for new designs (or existing designs interested in pursuing a physical re-implementation), there are some available enhancements to current N5 design rules and an update to the standard cell libraries.

Similarly, N6 is an update to the 7nm family, with increasing adoption of EUV lithography (over N7+).  TSMC indicated, “N7 remains a key offering for the increasing number of 5G mobile and AI accelerator designs in 2021.”

  • N7HPC and N5HPC

An indication of the demanding performance requirements of the HPC platform is the customer interest in applying supply voltage “overdrive”, above the nominal process VDD limit. TSMC will be offering unique “N7HPC” (4Q21) and “N5HPC” (2Q22) process variants supporting overdrive, as illustrated below.

There will be a corresponding SRAM IP design release for these HPC technologies.  As expected, designers interested in this (single digit percentage improvement) performance option will need to address increased static leakage, BEOL reliability acceleration factors, and device aging failure mechanisms.  TSMC’s investment in the development and qualification of processes specifically optimized for individual platforms is noteworthy.  (The last HPC-specific process variant was at the 28nm node.)

  • RF technology

The market demand for WiFi6/6E and 5G (sub-6GHz and mmWave) wireless communications has led TSMC to increase focus on process optimizations for RF devices.  RF switches are also a key application area.  Low power wireless communication protocols, such as Bluetooth (with significant digital integration functionality) are a focus, as well.  Automotive radar imaging systems will no doubt experience growing demand.  The mmWave applications are summarized in the figure below.

The two key parameters typically used to describe RF technology performance are:

  • device Ft (“cutoff frequency”), where current gain = 1, inversely proportional to device channel length, L
  • device Fmax (“maximum oscillation frequency”), where power gain = 1, proportional to the square root of Ft, inversely proportional to the square root of Cgd and Rg

The TSMC RF technology roadmap is shown below, divided into different application segments.

  • N6RF

The N6RF process was highlighted at the Symposium – a device performance comparison to N16FFC-RF is shown below.

The N28HPC+RF and N16FFC-RC processes also recently received enhancements – for example, improvements in the parasitic gate resistance, Rg, were highlighted.  For low-noise amplifier (LNA) applications, TSMC is evolving their SOI offerings at 130nm and 40nm.

  • ULP/ULL Technologies

IoT and edge device applications are forecast to become more pervasive, demanding increasing computational throughput at very low power dissipation (ULP) combined with ultra-low leakage (ULL) static power dissipation for improved battery life.

TSMC has provided ULP process variants – i.e., operational functionality for IP at very low VDD supply voltage.  TSMC has also enabled ULL solutions, with devices/IP utilizing optimized threshold voltages.

An overview of the IoT (ULP/ULL) platform and process roadmap is given below.

The N12e process node was highlighted by TSMC, integrating an embedded non-volatile memory technology (MRAM or RRAM), with standard cell functionality down to 0.55V (using SVT devices; low Vt cells would enable lower VDD and active power at higher leakage).  Comparable focus has been made to reduce the Vmin and standby leakage current of N12e SRAM IP, as well.

Summary

At the Symposium, TSMC introduced several new process developments, with specific optimizations for HPC, IoT, and automotive platforms.  RF technology enhancements are also a focus, in support of rapid adoption of new wireless communications standards.  And, to be sure, although it didn’t receive much emphasis at the Symposium, there is a clear execution roadmap for the advanced mainstream process nodes – N7+, N5, and N3 – with additional continuing process improvements as reflected in the release of intermediate nodes N6 and N4.

For more information on TSMC’s digital technology roadmap, please follow this link.

-chipguy

 


TSMC and the FinFET Era!

TSMC and the FinFET Era!
by Daniel Nenni on 06-09-2021 at 6:00 am

Intel 22nm wafer

While there is a lot of excitement around the semiconductor shortage narrative and the fabs all being full, both 200mm and 300mm, there is one big plot hole and that is the FinFET era.

Intel ushered in the FinFET era only to lose FinFET dominance to the foundries shortly thereafter. In 2009 Intel brought out a 22nm FinFET wafer at the Intel Developers Conference and announced that chips would be available in the second half of 2011. True to their word, the first FinFET chip (code named Ivey Bridge) was officially announced in May of 2011. I remember being shocked that the details were not leaked prior to the announcement. Intel 22nm was truly a transformative process technology, absolutely.

Intel followed 22nm with 14nm which was late and yield challenged (double patterning FinFETs) which allowed the foundries to catch up (TSMC 16nm and Samsung 14nm). Samsung did a very nice job at 14nm and won quite a bit of business including a slice of the Apple iPhone pie.

TSMC took a different approach to FinFETs. After mastering double patterning on 20nm, TSMC added in FinFETs and called it 16nm. The density was less than Intel 14nm thus the name difference. Samsung 14nm was a similar density as TSMC 16nm but Samsung took the low road and pretended they were competitive with Intel. And that is why process nodes are now marketing terms, my opinion.

This all started what I call the Apple half step process development methodology. TSMC would release a new process version without fail for Apple every year. Prior to that processes were like fine wine, not to be uncorked until they were Moore’s Law ready. The half steps continued with TSMC adding partial EUV to a process already in HVM (7nm) then adding more EUV layers to 5nm and 3nm in a very controlled manner that allowed for superior yield learning and record breaking process ramps.

Intel 14nm is also when the “Intel versus TSMC” marketing battle started. Intel insisted that TSMC 20nm was a failure since it did not include FinFETs and foundries could not follow Intel since they were an IDM and TSMC was just a foundry with no in-house design experience.

As we now know, Intel was wrong on so many levels. First and foremost the foundry business is a services business with a massive partnership ecosystem which puts IDM foundries at a distinct disadvantage. It will be interesting to see how the Intel IDM 2.0 strategy pans out but most guesses are that it will fail harder than the previous attempt, but I digress.

Now let’s take a quick look at the TSMC FinFET process revenue steps starting with Q1 2019 and the Q1s that have followed:

In Q1 2019 FinFETs accounted for 42% of TSMC revenue. In Q1 2020 it was 54.5% In Q1 2021 it was a whopping 63% and you can expect this aggressive ramp to continue for three reasons:

(1) TSMC protects their FinFET processes recipes so there is no second sourcing.

(2) FinFETs mean more performance at less power and less power is critical given the environmental challenges the world is facing.

(3) TSMC is building massive amounts of FinFET capacity ($100B 3 year CAPEX) and with the current semiconductor shortage narrative that is a VERY big deal.

Bottom line: TSMC is pushing their 500+ customers hard into the FinFET era and that will again change the foundry landscape.

The trillion dollar question is: What will happen to the mature (non-FinFET) nodes in the not too distant future? And more importantly, what will happen to the foundries that did not make the jump to FinFETs?


TSMC 2021 Technical Symposium Actions Speak Louder Than Words

TSMC 2021 Technical Symposium Actions Speak Louder Than Words
by Daniel Nenni on 06-01-2021 at 1:00 pm

TSMC Symposium 2021

The TSMC Symposium kicked of today. I will share my general thoughts while Tom Dillinger will do deep dives on the technology side. The event started with a keynote by TSMC CEO CC Wei followed by technology presentations by the TSMC executive staff.

C.C. Wei introduced a new sound bite this year that really resonated with me and that was “actions speak louder than words”. TSMC has always reminded me that it is important to speak softly and carry a big stick. While this does not always get TSMC the best media coverage it works extremely well with customers, and of course is a key ingredient to the TSMC “World’s Trusted Foundry Partner” strategy. Transparency is another key ingredient and you will not find a more transparent foundry than TSMC.

Who else presents defect density numbers? Which is really where the rubber meets the road for ramping new process technologies. Let me remind you how lucky we are to have C.C. Wei leading TSMC. He is a brilliant technologist and a great leader which is a very unique combination. I fully expect the many CEO awards to come his way in the not too distant future, absolutely.

The keynote was followed by presentations from the executive staff.  Noticeably missing was Cliff Hou who is now Senior Vice President, Europe and Asia Sales. My guess is that direct customer experience is a stepping stone to something bigger for Cliff. That and gray hair.

Learn About:

  • TSMC’s smartphone, HPC, IoT, and automotive platform solutions
  • TSMC’s advanced technology progress on 7nm, 6nm, 5nm, 4nm, 3nm processes and beyond
  • TSMC’s specialty technology breakthroughs on ultra-low power, RF, embedded memory, power management, sensor technologies, and more
  • TSMC’s advanced packaging technology advancement on InFO, CoWoS®, and SoIC and other exciting innovations
  • TSMC’s manufacturing excellence, capacity expansion plan, and green manufacturing achievement
  • TSMC’s Open Innovation Platform® Ecosystem to speed up time-to-design

Y.J. Mii (Senior Vice President, Research & Development) discussed advanced logic technologies, technology innovation beyond 3nm, and advanced integration technologies.

Kevin Zhang (Senior Vice President, Business Development) discussed specialty technology development and offerings.

Y.J. Mii (Senior Vice President, Research & Development) discussed advanced technology value aggregation, design ecosystem readiness for N5-N4-N3, and RF design platform update, and 3DIC design ecosystem for system innovation.

Y.P. Chin (Senior Vice President, Operations) provided a manufacturing update with new capacity ramping and new fab status, advanced packaging and testing operation, and green manufacturing.

This was followed by more technical sessions on advanced technology for smartphone and HPC platforms, 3D fabric technology, advanced RF and analog technology, BCD technologies for PMIC, eNVM and automotive, and ultra-low power technology for IoT platforms.

There is a LOT of information to cover so let us know what you are most interested in and we will prioritize as appropriate. Or ask us questions and we can answer them directly.

Hopefully the other foundries will take this symposium to heart and talk more about actions and how they have helped customers, the environment, and the world of electronics in a transparent manner. Thank you for reading and there is plenty more to come.


Is IBM’s 2nm Announcement Actually a 2nm Node?

Is IBM’s 2nm Announcement Actually a 2nm Node?
by Scotten Jones on 05-09-2021 at 6:00 am

Slide1

IBM has announced the development of a 2nm process.

IBM Announcement

What was announced:

  • “2nm”
  • 50 billion transistors in a “thumbnail” sized area later disclosed to be 150mm2 = 333 million transistors per millimeter (MTx/mm2).
  • 44nm Contacted Poly Pitch (CPP) with 12nm gate length.
  • Gate All Around (GAA), there are several ways to do GAA, based on the cross sections IBM is using horizontal nanosheets (HNS).
  • The HNS stack is built over an oxide layer.
  • 45% higher performance or 75% lower power versus the most advanced 7nm chips.
  • EUV patterning is used in the front end and allows the HNS sheet width to be varied between 15nm to 70nm. This is very useful to tune various areas of the circuit for low power or high performance and also for SRAM cells.
  • The sheets are 5nm thick and stacked three high.

Is this really “2nm” as claimed by IBM? The current leader in production process technology is TSMC. We have plotted TSMC node names versus transistor density and fitted a curve with a 0.99 R2 value, see figure 1.

Figure 1. TSMC Equivalent Nodes.

Using the curve fit we can convert transistor density to a TSMC Equivalent Node (TEN). Using curve fit we get a TEN of 2.9nm for the IBM announced 333MTx/mm2. In our opinion this makes the announcement a 3nm node, not a 2nm node.

To compare the IBM announcement in more detail to previously announced 3nm processes and projected 2nm processes we need to make some estimates.

  • We know the CPP is 44nm from the announcement.
  • We are assuming a Single Diffusion Break (SDB) that would result in the densest process.
  • Looking at the cross section that was in the announcement, we do not see Buried Power Rails (BPR), BPR is required to reduce HNS track height down to 5.0, so we assume 6.0 for the process.
  • To get to 333MTx/mm2 the Minimum Metal Pitch must be 18nm, a very aggressive value likely requiring EUV multipatterning.

IBM 2nm Versus Foundry 3nm

Figure 2 compares the IBM 2nm devise to our estimates for Samsung and TSMC 3nm processes. We know Samsung is also doing a HNS and TSMC is staying with a FinFET at 3nm. Samsung and TSMC have both announced density improvements for their 3nm processes versus their 5nm processes so we have known transistor density for all three companies and can compute TEN for all three. As previously noted, IBM’s TEN is 2.9, we now see Samsung’s TEN is 4.7 and TSMC’s TEN is 3.0 again reinforcing that IBM 2nm is like TSMC 3nm and Samsung is lagging TSMC.

The numbers in red in figure 2 are estimated to achieve the announced densities, We assume SDB for all companies. TSMC has the smallest track height because a FinFET can have a 5.0 track height without BPR, but HNS needs BPR to reach 5.0 in BPR isn’t ready yet.

Figure 2. IBM 2nm Versus Foundry 3nm.

IBM 2nm Versus Foundry 2nm

We have also projected Samsung and TSMC 2nm processes in figure 3. We are projecting that both companies will use BPR (BPR is not ready yet but likely will be when Samsung and TSMC introduce 2nm around 2023/2024). We also assume that Samsung and TSMC will utilize a forksheet NHS (HNS (FS) architecture to reach a 4.33 track height relaxing some of the other shrink requirements. We have then projected out CPP and MMP based on the company’s recent shrink trends.

Figure 3. IBM 2nm Versus Foundry 2nm.

 Power and Performance

At ISS this year I estimated relative power and performance for Samsung and TSMC by node with some additional Intel performance data. The trend by node is based on the companies announced power and performance scaling estimates versus available comparisons at 14nm/16nm. For more information see the ISS article here.

Since IBM compared their power and performance improvements to leading 7nm performance I can place the IBM power and performance on the same trend plots I previously presented, see figure 4.

Figure 4. Power and Performance (estimates).

 IBM’s use of HNS yields a significant reduction in power and makes their 2nm process more power efficient than Samsung or TSMC’s 3nm process, although we believe once TSMC adopts HNS at 2nm they will be as good or better than IBM for power. For performance we estimate that TSMC’s 3nm process will outperform the IBM 2nm process.

As discussed in the ISS article these trends are only estimates and are based on a lot of assumptions but are the best projections we can put together.

Conclusion

After analyzing the IBM announcement, we believe their “2nm” process is more like a 3nm TSMC process from a density perspective with better power but inferior performance. The IBM announcement is impressive but is a research device that only has a clear benefit versus TSMC’s 3nm process for power and TSMC 3nm will be in risk starts later this year and production next year.

We further believe that TSMC will have the leadership position in density, power, and performance at 2nm when their process enters production around 2023/2024.

Also Read:

Ireland – A Model for the US on Technology

How to Spend $100 Billion Dollars in Three Years

SPIE 2021 – Applied Materials – DRAM Scaling


You know you have a problem when 60 Minutes covers it!

You know you have a problem when 60 Minutes covers it!
by Robert Maire on 05-03-2021 at 2:00 pm

60 Minutes Chip Shortage

-Chip shortage on 60 Minutes- Average Joe now aware of chip issue
-Intel sprinkling fairy dust (money) on New Mexico & Israel
-Give up on buy backs and dividends
-Could Uncle Sam give a handout to Intel?

You normally don’t want to answer the door if 60 Minutes TV crew is outside as it likely doesn’t mean good things. But in the case of the chip industry, the shortage that has been talked about in all media outlets has finally come home to prime time.

The chip shortage that has impacted industries across the board from autos to appliances to cigarettes so it has gotten prime time attention;

CBS 60 Minutes program on Chip Shortage

60 Minutes got hold of some of our past articles including our recent ones about the shortage and China risks and contacted us.

We gave them a lot of background information and answered questions about the industry and shortages as we wanted to help provide an accurate picture.

Overall, we think they did a great job representing what was going on in the industry and were both accurate and informing.

Does Intel have its hand out?

We have previously mentioned that we thought that Intel was looking for government help and maybe a handout which was touched upon in Pat Gelsinger’s interview , up front. While certainly not directly asking for money, it certainly sounds like Intel wouldn’t say no. Intel was clearly shopping the idea under the previous administration in the White House as well as previous Intel management.

The chip shortage both amplifies that prior request as well as makes it more timely. It also gets even more timely when it is put under the banner of infrastructure repair.

Intel is going to hemorrhage Money

We have said that Intel’s financial’s were going to get a lot worse before they got any better.

We suggested they would triple spend 1) Spend to have TSMC make product 2) Spend to catch up to TSMC (like on EUV and other tools) 3) spend to build extra capacity to become a foundry.

Intel, Gelsinger, even said on 60 minutes that they are not going to be doing stock buy backs.

Intel in Israel & New Mexico

Intel has just announced that in addition the the $20B for two new foundries that it is spending in Arizona, it is spending $3.5B in New Mexico on packaging technology & capacity.

Intel is also spending $200M on a campus in Haifa, $400M for Mobileye in Israel and $10B to expand its 10NM fab in Kiryat Gat, Israel . Its interesting to note that the spend in Israel is not mentioned on Intel’s newsroom website as it likely doesn’t fit the “move technology & jobs back to the US” that Gelsinger espoused on 60 Minutes.

Between spending on production at TSMC, fixing Intel, building foundries, New Mexico, Mobileye, Israel (likely Ireland as well)…Intel is going to be raining down money all over.

Mark Liu on 60 minutes

Mark Liu was also interviewed as the clear leader in technology and capacity in the chip industry. We think that Liu was very accurate and straight forward when he said that TSMC was surprised that Intel had fumbled.

He also clearly is on the side of the industry that downplays the shortages and thinks they will be short lived.

As to the “repatriation” of the chip industry to the US, as expected he sees no reason for it.

He also stayed away from commentary about the “Silicon Shield” provided to Taiwan by its leadership in chips.

TSMC is clearly in the drivers seat and is not likely to change any time soon

The Stocks

Given the spending and gargantuan task ahead we have suggesting avoiding Intel’s stock as its going to both take longer and cost more than anyone suggests and the odds of success aren’t great.

Gelsinger is on a world tour sprinkling fairy dust around which he will need the luck of as we go forward.

We would not be surprised if the government does indeed write Intel a check as they are the US’s only and last hope of getting back in the semiconductor game which is so critical to our future, not to mention our short term needs.

All this spend will do zero to help the shortage but the shortage did at least bring these issues (many of which we have been talking about for years) to the forefront of peoples minds.

We do continue to think that the semi equipment industry will likely benefit big time especially ASML as they have a lock on EUV.

We also think equipment companies can make a few bucks on their old 6″ and 8″ tools if they can resurrect manufacturing as those are the fabs in shortest supply.

Also Read:

KLAC- Great QTR & Guide- Foundry/logic focus driver- Confirms $75B capex in 2021

Lam Research performing like a Lion – Chip equip on steroids

ASML early signs of an order Tsunami – Managing the ramp


How to Spend $100 Billion Dollars in Three Years

How to Spend $100 Billion Dollars in Three Years
by Scotten Jones on 04-25-2021 at 6:00 am

Slide1 1

TSMC recently announced plans to spend $100 billion dollars over three years on capital. For 2021 they announced $30B in total capital with 80% on advanced nodes (7nm and smaller), 10% on packaging and masks and 10% on “specialty”.

If we take a guess at the capital for each year, we can project something like $30B for 2021 (announced), $33.5B for 2022 and $36.5B for 2023. $30B + $33.5B + $36.5B = $100B. The exact breakout by year for 2022 and 2023 may be different than this but overall, the numbers work. If we further assume that the 80% spending on advanced node ratio will be maintained over the three years, we get: $24B for 2021, $26.8B for 2022 and $29.2B for 2023 ($80B total).

What kind of advanced capabilities can you buy for $80B over 3 years?

Figure 1 illustrates our view of TSMC’s advanced node plans.

Figure 1. TSMC Advanced Node Plans.

To begin 2021, TSMC had record 7nm revenue in Q1 and we believe they needed to add 25K wafer per month (wpm) of capacity to do that, whether that spending was in 2021 or late 2020 is subject to debate. 5nm was in production beginning in the second half of 2020 and we believe a farther ramp up of 60k wpm will take place in 2021 reaching 120k wpm by year end. Also, in late 2021 will be 3nm risk starts requiring the completion of one cleanroom phase and an estimated 15k wpm of 3nm capacity.

2022 will see the ramp up of 3nm with an additional 60K wpm of capacity.

2023 Will see the build out of 5nm capacity at the Arizona fab, and an additional 45k wpm of 3nm capacity. Finally, we expect 2nm risk starts in 2023 requiring a cleanroom build out and 15k wpm. Where 5nm and 3nm are being produced in 3 cleanroom phases each, TSMC has announced that 2nm will be built in four cleanroom phases and we have planned on two phase in 2023.

Figure 2 illustrates our view of TSMC’s capital spending by node for 7nm, 5nm, 3nm and 2nm.

Figure 2. TSMC Capital Spending on Advanced Nodes.

In 2021 we have $4.6B for 7nm capacity, $15.2B for additional 5nm capacity and $6.4B for the initial 3nm cleanroom and risk starts capability. The total $26.3B is more than the calculated $24B so some of the 7nm capacity may be in 2020 or some of the 3nm spending may be in 2022.

In 2022 we have $23.2B for additional 3nm capacity, this is less than the $26.8B expected for 2022. Because 2023 is expected to have spending in Arizona, more 3nm capacity and the initial 2nm build out it is possible 2022 may see less capital spending than we initially assumed and 2023 more capital spending.

For 2023 we have the first 5nm phase built out in Arizona for $5.7B, additional 3nm capacity for $15.4B and the initial build out of 2nm for $9.3B. The total for 2023 is $33.5B, more than the estimated $29.2B.

If we add up our forecast over three years, we get $79.8B versus the $80B estimate assuming 80% of the announced $100B is spent on advanced nodes. We should also keep in mind that the $100B is a three-year estimate subject to changing market conditions.

In this scenario, in 2023 TSMC will have 140k wpm of 5nm production capacity, 120k wpm of 3nm production capacity and 15k wpm of 2nm risk start capacity.

Also Read:

SPIE 2021 – Applied Materials – DRAM Scaling

Kioxia and Western Digital and the current Kioxia IPO/Sale rumors

Intel Node Names


TSMC Ups CAPEX Again!

TSMC Ups CAPEX Again!
by Daniel Nenni on 04-16-2021 at 6:00 am

TSMC 1Q21 Revenue by Platform

We were all pleasantly surprised when TSMC increased 2021 Capex to a record $28 billion. To me this validated the talk inside the ecosystem that Intel would be coming to TSMC at 3nm. We were again surprised when TSMC announced a $100B investment over the next three years which belittled Intel’s announcement that they would spend $20B on two new fabs in Arizona.

It wasn’t clear what the TSMC investment included but we now know (via the Q1 2021 Investor Call) that it’s predominantly CAPEX starting with $30B in 2021 and the rest over 2022 and 2023. Personally, I think TSMC CAPEX will end up being more than $100B because TSMC tends to be conservative with their numbers, absolutely.

Let’s take a look at CC Wei’s opening statement on yesterday’s investor call:

CC Wei First, let me talk about the capacity shortage and demand outlook. Our customers are currently facing challenges from the industry-wide semiconductor capacity shortage, which is driven by both a structural increase in long-term demand as well as short-term imbalance in the supply chain. We are witnessing a structural increase in underlying semiconductor demand as a multi-year megatrend of 5G and HPC-related applications are expected to fuel strong demand for our advanced technologies in the next several years. COVID-19 has also fundamentally accelerate the digital transformation, making semiconductors more pervasive and essential in people’s life.

D.A.N. The short term imbalance is of course the drop in utilization last year due to the uncertainty brought by the pandemic and now the hockey stick shape rebound which includes some panic buying. The bottom line is that we have enough capacity today and more than enough capacity coming tomorrow so no worries here.

CC Wei: To address the structural increase in the long-term demand profile, we are working closely with our customers and investing to support their demand. We have acquired land and equipment and started the construction of new facilities. We are hiring thousands of employees and expanding our capacity at multiple sites. TSMC expects to invest about USD 100 billion through the next 3 years to increase capacity, to support the manufacturing and R&D of leading-edge and specialty technologies. Increased capacity is expected to improve supply certainty for our customers and help strengthen confidence in global supply chains that rely on semiconductors.

D.A.N. Based on what we have seen on the SemiWiki job board TSMC is indeed hiring thousands of employees and the TSMC job posts are getting 2x more views than average. And yes TSMC is already spending that $100B, $8.8B was consumed in Q1 2021.

CC Wei:  Our capital investment decisions are based on 4 disciplines: technology leadership, flexible and responsive manufacturing, retaining customers’ trust and earning the proper return. At the same time, we face manufacturing cost challenges due to increasing process complexity at leading node, new investment in mature nodes and rising material costs. Therefore, we will continue to work closely with customers to sell our value. Our value includes the value of our technology, the value of our service and the value of our capacity support to customers. We will look to firm up our wafer pricing to a reasonable level.

D.A.N. Translation: there will be pricing adjustments to compensate for the added capacity.

CC Wei:  Next, let me talk about the automotive supply update. The automotive market has been soft since 2018. Entering 2020, COVID-19 further impact the automotive market. The automotive supply chain was affected throughout the year, and our customers continued to reduce their demand throughout the third quarter of 2020. We only began to see sudden recovery in the fourth quarter of 2020.

However, the automotive supply chain is long and complex with its own inventory management practices. From chip production to car production, it takes at least 6 months with several tiers of suppliers in between. TSMC is doing its part to address the chip supply challenges for our customers.

D.A.N. Some car companies have shortages and some don’t, it all depends on inventory and who cut orders in 2020. Toyota I’m told has the best managed inventory and is still making cars. Other car companies not so much.

CC Wei: Finally, I will talk about the N5 and N3 status. TSMC’s N5 is the foundry industry’s most advanced solution with the best PPA. N5 is already in its second year of volume production with yield better than our original plan. N5 demand continue to be strong, driven by smartphone and HPC applications, and we expect N5 to contribute around 20% of our wafer revenue in 2021.

D.A.N. I was told by a gaming chip leaker that there is panic buying in crypto and gaming which may explain TSMC’s big HPC numbers. Also, the word inside the ecosystem is that Samsung is having problems so there is a burst of 5N and 3N design activity. In fact, 80% of the 2021 CAPEX is being spent on 5N and 3N (which are pretty much identical fabs using different process recipes).

CC Wei: N3 will be another full node stride from our N5 and will use FinFET transistor structure to deliver the best technology maturity, performance, and cost for our customers. Our N3 technology development is on track with good progress. We continue to see a much higher level of customer engagement for both HPC and smartphone applications at N3 as compared with N5 and N3 at a similar stage.

D.A.N. This is due to Samsung’s failure at 3nm. Scotten Jones did a nice blog on this earlier this year:

ISS 2021 – Scotten W. Jones – Logic Leadership in the PPAC era

CC Wei: Risk production is scheduled in 2021. The volume production is targeted in second half of 2022. Our 3-nanometer technology will be the most advanced foundry technology in both PPA and transistor technology. Thus, we are confident that both our 5-nanometer and 3-nanometer will be large and long-lasting nodes for TSMC.

D.A.N. Apple iProducts will be 3N next year which means HVM in 2H 2022. The IDM foundries (Intel and Samsung) do initial product introductions and spend a year or two ramping up to HVM so it is hard to compare new process introduction dates.

You can join a more detailed discussion here in the experts forum: TSMC Q1 2021 Earnings Conference Call


Foundry Fantasy- Deja Vu or IDM 2?

Foundry Fantasy- Deja Vu or IDM 2?
by Robert Maire on 03-26-2021 at 8:00 am

Foundry Profit 2020

– Intel announced 2 new fabs & New Foundry Services
– Not only do they want to catch TSMC they want to beat them
– It’s a very, very tall order for a company that hasn’t executed
– It will require more than a makeover to get to IDM 2.0

Intel not only wants to catch TSMC but beat them at their own game

Intel announced that it was going to spend $20B on two new fabs in Arizona and establish Intel Foundry Services as part of re-imagining Intel into “IDM 2.0”. The stated goal would be to provide foundry services to customers much as TSMC does so well today.

This will not be easy. A lot of companies have died on that hill or been wounded. Global Foundries famously gave up. Samsung still spends oodles of money trying to keep within some sort of distance to TSMC. UMC, SMIC and many others just don’t hold a candle to TSMC’s capabilities and track record.

This all obviously creates a very strange dynamic where Intel is highly dependent upon TSMC’s production for the next several years but then thinks it can not only wean itself off of TSMC’s warm embrace but produce enough for itself as well as other customers to be a real foundry player.

If Pat Gelsinger can pull this off he deserves a billion dollar bonus

This goes beyond doubling down on Intel’s manufacturing and well into a Hail Mary type of play. This may turn out to be an aspirational type of goal in which everyone would be overjoyed if they just caught back up to TSMC.

Like Yogi Berra said “It’s Deja Vu all over again”- Foundry Services 2.0

Lest anyone conveniently forget, Intel tried this Foundry thing before and failed, badly. It just didn’t work. They were not at all competitive.

It could be that we are just past the point of remembering that it was a mistake and have forgotten long enough to try again.

We would admit that Intel’s prior attempt at being a foundry services provider seemed almost half hearted at best. We sometimes thought that many long time Intel insiders previously snickered at being a foundry as they somehow thought it beneath them.

Trying to “ride the wave” of chip shortage fever?

It could also be that Intel is trying to take advantage of the huge media buzz about the current chips shortage by playing into that theme, and claiming to have the solution.

We would remind investors that the current chip shortage that has everyone freaked out will be long over, done and fixed and a distant memory before the first brick is even laid for the two new fabs Intel announced today. But it does make for good timing and PR.

Could Intel be looking for a chunk of “Chips for America” money?

Although Intel said on the call that government funding had nothing to do with whether or not they did the project we are certain that Intel will have its hand out and lobby big time to be the leader of Chips for America.

We would remind investors that the prior management of Intel was lobbying the prior White House administration hard to be put in charge of the “Chips for America” while at the exact same time negotiating to send more product (& jobs) to TSMC.

This is also obviously well timed as is the current shortage. Taken together the idea of Intel providing foundry services makes some sense on the surface at least.

Intel needs to start with a completely clean slate with funding

We think it may be best for Intel to start as if it never tried being a foundry before. Don’t keep any of the prior participants as it didn’t work before.
Randhir Thakur has been tasked with running Intel Foundry Services. We would hope that enough resources are aimed at the foundry undertaking to make it successful. It needs to stand alone and apart.

Intel’s needs different “DNA” in foundry- two different companies in one

The DNA of a Foundry provider is completely different than that of being an IDM. They both do make chips but the similarity stops there.

The customer and customer mindset is completely different. Even the technology is significantly different from the design of the chips, to the process flows in the fabs to package and test. The design tools are different, the manufacturing tools are different and so is packaging and test equipment.

While there is a lot of synergy between being a fab and an IDM it would be best to run this as two different companies under one corporate roof. It’s going to be very difficult to share: Who gets priority? Who’s needs come first? One of the reason’s Intel’s foundry previously failed was the the main Intel seemed to take priority over foundry and customers will not like the obvious conflict which has to be managed.

Maybe Intel should hire a bunch of TSMC people

Much as SMIC hired a bunch of TSMC people when it first started out, maybe Intel would be well served to hire some people from TSMC to get a jump start on how to properly become a real foundry. It would be poetic justice of a US company copying an Asian company that made its bones copying US companies in the chip business.

We have heard rumor that TSMC is offering employees double pay to move from Taiwan to Arizona to start up their new fab there. Perhaps Intel should offer to triple pay TSMC employees to move and jump ship. It would be worth their while. Intel desperately needs the help.

Pat Gelsinger is bringing back a lot of old hands from prior years at Intel as well as others in the industry (including a recent hire from AMAT) but Intel needs people experienced in running a foundry and dealing with foundry customers. Intel has to hire a lot of new and experienced people because they not only need people to catch up their internal capacity, which is not easy, and it needs more people to become a foundry company and the skillsets, like the technology are completely different. This is not going to be either cheap or easy.

I don’t get the IBM “Partnership”

IBM hasn’t been a significant, real player in semiconductors in a very, very long time. It may have a bunch of old patents but it has no significant current process technology that is of true value. It certainly doesn’t build current leading edge or anything close nor does it bring anything to the foundry party.
Its not like IBM helped GloFo a lot. They brought nothing to the table. GloFo still failed in the Moore’s law race. In our view IBM could be a net negative as Intel has to “think different” to be two companies in one, it needs to re-invent itself.

The IBM “partnership” is just more PR “fluff” just like the plug from Microsoft and quotes from tech leaders in the industry that accompanied the press release. Its nonsense.

Don’t go out and buy semi equipment stocks based on Intel’s announcements

Investors need to stop and think how long its going to be before Intel starts ordering equipment for the two $10B fabs announced. Its going to be years and years away.

The buildings have to be designed, then built before equipment can even be ordered. Maybe if we are lucky the first shovel goes in the ground at the end of 2021 and equipment starts to roll in in 2023…maybe beginning production at reasonable scale by 2025 if lucky. Zero impact on current shortage – Even though Intel uses the current shortage as excuse to restart foundry

The announcement has zero, none, nada impact on the current shortage for two significant reasons;

First, as we have just indicated it will be years before these fabs come on line let alone are impactful in terms of capacity. The shortages will be made up for by TSMC, Samsung, SMIC, GloFo and others in the near term. The shortages will be ancient history by the time Intel gets the fabs on line.

Second, as we have previously reported, the vast majority of the shortages are at middle of the road or trailing edge capacity made in 10-20 years old fabs on old 8 inch equipment. You don’t make 25 cent microcontrollers for anti-lock brakes in bleeding edge 7NM $10B fabs, the math doesn’t work. So the excuse of getting into the foundry business because of the current shortage just doesn’t fly, even though management pointed to it on the call.

Could Intel get Apple back?

As we have said before, if we were Tim Apple, a supply chain expert, and the entire being of our company was based on Taiwan and China we might be a little nervous. We also might push our BFF TSMC to build a gigafab in the US to secure capacity. The next best thing might be for someone else like Intel or Samsung to build a gigafab foundry in the US that I could use and go back to two foundry suppliers fighting for my business with diverse locations.

The real reason Intel needs to be a foundry is the demise of X86

Intel has rightly figured out that the X86 architecture is on a downward spiral. Everybody wants their own custom ARM, AI, ML, RISC, Tensor, or what ever silicon chip. No one wants to buy off the rack anymore they all want their own bespoke silicon design to differentiate the Amazons from the Facebooks from the Googles.

Pat has rightly figured out that its all about manufacturing. Just like it always was at Intel and something TSMC never stopped believing. Yes, design does still matter but everybody can design their own chip these days but almost no one, except TSMC, can build them all.

Either Intel will have to start printing money or profits will suffer near term

We have been saying that Intel is going to be in a tight financial squeeze as they were going to have reduced gross margins by increasing outsourcing to TSMC while at the same time re-building their manufacturing, essentially having a period of almost double costs (or at least very elevated costs).

The problem just got even worse as Intel is now stuck with “triple spending”. Spending (or gross margins loss) on TSMC, re-building their own fabs and now a third cost of building additional foundry capacity for outside customers.
We don’t see how Intel avoids a financial hit.

Its not even sure that Intel can spend enough to catch up let alone build foundry capacity even if it has the cash

We would point out that TSMC has the EUV ASML scanner market virtually tied up for itself. They have more EUV scanners than the rest of the world put together.

Intel has been a distant third after Samsung in EUV efforts. If Intel wants to get cranking on 7NM and 5NM and beyond it has a lot of EUV to buy. It can’t multi-pattern its way out of it. Add on top of that a lot of EUV buying to become a foundry player as the PDKs for foundry process rely a lot less on the tricks that Intel can pull on its own in house design and process to avoid EUV. TSMC and foundry flows are a lot more EUV friendly.

As we have previously pointed out the supply of EUV scanners can’t be turned on like a light switch, they are like a 15 year old single malt, it takes a very long time to ramp up capacity, especially lenses which are a critical component.
I don’t know if Intel has done the math or called their friends at ASML to see if enough tools are available. ASML will likely start building now to be ready to handle Intel’s needs a few years from now if Intel is serious.

Being a foundry is even harder now

Intel was asked on the call “what’s different this time” in terms of why foundry will work now when it didn’t years ago and their answer was that foundry is a lot different now.

We would certainly agree and suggest that being a leading edge foundry is even much more difficult now. Its far beyond just spending money and understanding technology. Its mindset and process. Its not making mistakes. To underscore both TSMC and Pat Gelsinger its “execution, execution & execution” We couldn’t agree more. Pat certainly “gets it” the question is can he execute?

The tough road just became a lot tougher

Intel had a pretty tough road in front of it to catch the TSMC juggernaut. The road just got a lot more difficult to both catch them and beat them at their own game, that’s twice as hard.

However we think that Pat Gelsinger has the right idea. Intel can’t just go back to being the technology leader it was 10 or 20 years ago, it has to re-invent itself as a foundry because that is what the market wants today (Apple told them so).

It’s not just fixing the technology , it’s fixing the business model as well, to the new market reality.

It’s going to be very, very tough and challenging but we think that Intel is up for it. They have the strategy right and that is a great and important start.

All they have to do is execute….

Related:

Intel Will Again Compete With TSMC by Daniel Nenni 

Intel’s IDM 2.0 by Scotten Jones 

Intel Takes Another Shot at the Enticing Foundry Market by Terry Daly


Resistive RAM (ReRAM) Computing-in-Memory IP Macro for Machine Learning

Resistive RAM (ReRAM) Computing-in-Memory IP Macro for Machine Learning
by Tom Dillinger on 03-18-2021 at 6:00 am

testsite

The term von Neumann bottleneck is used to denote the issue with the efficiency of the architecture that separates computational resources from data memory.   The transfer of data from memory to the CPU contributes substantially to the latency, and dissipates a significant percentage of the overall energy associated with the computation.

This energy inefficiency is especially acute for the implementation of machine learning algorithms using neural networks.  There is a significant research emphasis on in-memory computing, where hardware is added to the memory array in support of repetitive, vector-based data computations, reducing the latency and dissipation of data transfer to/from the memory.

In-memory computing is well-suited for machine learning inference applications.  After the neural network is trained, the weights associated with the multiply-accumulate (MAC) operations at each network node are stored in the memory, and can be used directly as multiplication operands.

At the recent International Solid-State Circuits Conference (ISSCC), researchers from the National Tsing Hua University and TSMC presented several novel design implementation approaches toward in-memory computing, using resistive RAM (ReRAM). [1]  Their techniques will likely help pave the way toward more efficient AI implementations, especially at the edge where latency and power dissipation are key criteria.

Background

An example of a fully-connected neural network is shown in the figure below.

A set of input data (from each sample) is presented to the network – the input layer.  A series of computations is performed at each subsequent layer.  In the fully-connected network illustrated above, the output computation from each node is presented to all nodes in the next layer.  The final layer of the trained network is often associated with determining a classification match to the input data, from a fixed set of labeled candidates (“supervised learning”).

The typical computation performed at each node is depicted below.  Each data value is multiplied by its related (trained) weight constant, then summed – a multiply-accumulate (MAC) calculation.  A final (trained) bias value may be added.  The output of a numeric activation function is used to provide the node output to the next layer.

The efficiency of the node computation depends strongly on the MAC operation.  In-memory computing architectures attempt to eliminate the delay and power dissipation of transferring weight values for the MAC computation.

The figures above illustrate how the multiplication of (data * weight) could be implemented using the value stored in a one-transistor, one-resistor (1T1R) ReRAM bitcell. [2]

ReRAM technology offers a unique method for non-volatile storage in a memory array.  A write cycle to the bitcell may change the property of the ReRAM material, between a high-resistance (HR) and low-resistance (LR) state.  Subsequent to the write cycle, a bitline current-sense read cycle differentiates between the resistance values to determine the stored bit.

Again referring to the figure above, with the assumption that HR = ‘0’ and LR = ‘1’, the ReRAM cell implements the (data * weight) product in the following manner:

  • if the data = ‘0’, the word line to the bitcell is inactive and little bitline current flows
  • if the data = ‘1’ (word line active), their bitcell current will either be iHR or iLR

If the bitline current sense circuitry distinguishes between iHR (small) and iLR (large), only the product (data = ‘1’) * (weight = ‘1’) = ‘1’ results in significant bitline current.

The summation of the (data * weight) product for multiple data values into the fully-connected network node is illustrated in the figure above.  Unlike a conventional memory array where only one decoded address word line is active, the in-memory computing MAC will have an active word line for each node input where (data = ‘1’).  The total bitline current will be the sum of the parallel ‘dotted’ bitcell currents where the individual word lines are active, either iLR or iHR for each.  The multiply-accumulate operation for all (data * weights) is readily represented as the total bitline current.

At the start of the MAC operation, assume a capacitor connected to the bitline is set to a reference voltage (say, either fully pre-charged or discharged).  The clocked duration of the MAC computation will convert the specific bitline current in that clock cycle into a voltage difference on that capacitor:

delta_V = (I_bitline) * (delta_T) / Creference

That voltage can be read by an analog-to-digital converter (ADC), to provide the digital equivalent of the MAC summation.

In-Computing ReRAM Innovations

The ISSCC presentation from researchers at National Tsing Hua University and TSMC introduced several unique innovations to the challenges of ReRAM-based in-memory computing.

Data and Weight Vector Widths

The simple examples in the figures above used a one-bit data input and a one-bit weight.  A real edge AI implementation will have data vector and weight vector widths as input to the MAC operation.  For example, consider the case of 8-bit data and 8-bit weights for each multiplication product in the MAC operation.  (Parenthetically, the vector width of the weights after network training need not be the same of the input data vector width.  Further, the numeric value of the width vector could be any of a number of representations – e.g., signed or unsigned integer, twos complement.)  For the example, at each network node, the in-memory computation architecture needs to compute multiple products of two 8-bit vectors and accumulate the sum.

While the ReRAM array macro computes the MAC for the network node, circuitry outside the array would be used to add the bias, and apply the activation function.  This function would also normalize the width of the node output result to the input data vector width for the next network layer.

The researchers implemented a novel approach toward the MAC calculation, expanding upon the 1-bit ReRAM example shown above.

The description above indicated that the duration of the bitline current defines the output voltage on the reference capacitor.

The researchers reviewed several previous proposals for generating the data vector input-to-word line duration conversion, as illustrated below.

The input data value could be decoded into a corresponding number of individual word line pulses, as illustrated below.

Alternatively, the data value could be decoded into a word line pulse of different durations.  The multiplication of the data input vector times each bit of the weight could be represented by different durations of the active word line to the ReRAM bit cell, resulting in different cumulative values of bitline current during the read cycle.  The figure below illustrates the concept, for four 3-bit data inputs applied as word lines to a weight vector bitline, shown over two clock cycles.

For a data value of ‘000’, the word line would remain off;  for a data value of ‘111’, the maximum word line decode pulse duration would be applied.  The data input arcs to the network node would be dotted together as multiple active cells on the column bitline, as before.

Each column in the ReRAM array corresponds to one bit of the weight vector – the resulting voltage on the reference capacitor is the sum of all node data inputs times one bit of the weight.

Outside of the ReRAM array itself, support circuitry is provided to complete the binary vector (data*weight) multiplication and accumulation operation:

  •  an ADC on each bitline column converts the voltage value to a binary vector
  • shifting the individual binary values for the MSB to LSB of the weight vector
  • generating the final MAC summation of the shifted weight bits

The researchers noted that these two approaches do not scale well to larger data vector widths:

  • the throughput is reduced, as longer durations are needed
  • for the long pulse approach, PVT variations will result in jitter in the active word line duration, impacting the accuracy

The researchers chose to implement a novel, segmented duration approach.  For example, an 8-bit data input vector is divided into 3 separate ReRAM operations, of 2-3-3 bits each.  The cumulative duration of these three phases is less than the full data decode approach, improving the computation throughput.

Scaling the Bitline Current

With the segmented approach, the researchers described two implementation options:

  • at the end of each phase, the reference capacitor voltage is sensed by the ADC, then reset for the next phase;  the ADC output provides the data times weight bit product for the segmented data vector slice
  • the reference capacitor voltage could be held between phases, without a sample-and-reset sequence

In this second case, when transitioning from one data vector segment to the next, it is necessary to scale the capacitor current correspondingly.  If the remaining data vector width for the next segment phase is n bits, the capacitor current needs to be scaled by 1/(2**n).  The figure below provides a simplified view to how the researchers translated the bitline current in each phase into a scaled reference capacitor current.

A pFET current mirror circuit is used to generate a current into the reference capacitor;  the unique nature of a current mirror is by adjusting device sizes in the mirror branch, scaled values of the bitline current are generated.  Between the data vector segment phases, the capacitor voltage is held, and a different scaled mirror current branch is enabled.

For the in-memory ReRAM computing testsite, the researchers chose to use the full reference capacitor reset phase for the most significant bits segment, to provide the optimum accuracy, as required for the MSBs of the data input.  For the remaining LSBs of the data, the subsequent phases used the switched current mirror approach.

Process Variations

The researchers acknowledged that there are significant tolerances in the high and low resistance values of each ReRAM bitcell.  When using ReRAM as a simple memory array, there is sufficient margin between lowR and highR to adequately sense a stored ‘1’ and ‘0’.

However, as the in-memory computing requirements rely on accumulation of specific (dotted) bitcell currents, these variations are a greater issue.  The researchers chose to use an “averaging” approach – each stored weight bit value is copied across multiple ReRAM bitcells (e.g., # of copies = 4).  Although the figures above depict each data input vector as one ReRAM word line, multiple word lines connected each weight bit are used.

Testsite and FOM

TSMC fabricated an ReRAM testsite using this segmented data vector technique.  The specs are shown in the figure above.  The testsite provided programmability for different data vector widths and weight vector widths – e.g., 8b-8b-14b represents an eight bit data input, an eight bit weight, and a full MAC summation supporting a fourteen bit result at the network node.

The researchers defined a figure-of-merit for MAC calculations using in-memory computing:

        FOM = (energy_efficiency * data_vector_width * weight_vector_width * output_vector_width) / latency

(Energy efficiency is measured in TOPS/Watt;  the output vector width from the ReRAM array and support circuitry is prior to bias addition and activation/normalization.)

Summary

Edge AI implementations are hampered by the power and latency inefficiencies associated with the von Neumann bottleneck, which has sparked great interest in the field of in-memory computing approaches.  Read access to a ReRAM array storing weight values offers a unique opportunity to implement a binary product of data and weights.  Researchers at TSMC and National Tsing Hua University have implemented several novel approaches toward the use of ReRAM for the MAC computation at each neural network node, addressing how to efficiently work with wide data vectors, and manage ReRAM process variation.  I would encourage you to read their recent technical update provided at ISSCC.

-chipguy

References

[1]   Xue, Cheng-Xin, et al., “A 22nm 4Mb 8b-Precision ReRAM Computing-in-Memory Macro”, ISSCC 2021, paper 16.1.

[2]  Mao, M., et al., “Optimizing Latency, Energy, and Reliability of 1T1R ReRAM Through Cross-Layer Techniques”, IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2016, p. 352-363.

 


All-Digital In-Memory Computing

All-Digital In-Memory Computing
by Tom Dillinger on 03-15-2021 at 6:00 am

NOR gate

Research pursuing in-memory computing architectures is extremely active.  At the recent International Solid State Circuits conference (ISSCC 2021), multiple technical sessions were dedicated to novel memory array technologies to support the computational demand of machine learning algorithms.

The inefficiencies associated with moving data and weight values from memory to a processing unit, then storing intermediate results back to memory are great.  The information transfer not only adds to the computational latency, but the associated power dissipation is a major issue.  The “no value add” data movement is a significant percentage of the dissipated energy, potentially even greater than for the “value add” computation, as illustrated below. [1]  Note that the actual computational energy dissipation is a small fraction of the energy associated with data and weight transfer to the computation unit.  The goal of in-memory computing is to reduce these inefficiencies, especially critical for the implementation of machine learning inference systems at the edge.

The primary focus of in-memory computing for machine learning applications is to optimize the vector multiply-accumulate (MAC) operation associated with each neural network node.  The figure below illustrates the calculation for the (trained) network – the product of each data input times weight value is summed, then provided to a bias and activation function.

For a general network, the data and weights are typically multi-bit quantities.  The weight vector (for a trained, edge AI network) could use a signed, unsigned, or twos complement integer bit representation.  For in-memory computing, the final MAC output is realized by the addition of partial multiplication products.  The bit width of each (data * weight) arc into the node is well-defined – e.g., the product of 2 n-bit unsigned integers is covered by a 2n-bit vector.  Yet, the accumulation of (data * weight) products for all arcs into a highly-connected network could require significantly more bits to accurately represent the MAC result.

One area of emphasis of the in-memory computing research has been to implement a bitline current-sense measurement using resistive RAM (ReRAM) bitcells.  The product of the data input (as the active memory row wordline) and weight value stored in the ReRAM cell generates a distinguishable bitline current applied to charge a reference capacitance.  A subsequent analog-to-digital converter (ADC) translates this capacitor voltage into the equivalent binary value for subsequent MAC shift-add accumulation.  Although the ReRAM-based implementation of the (data * weight) product is area-efficient, it also has its drawbacks:

  • the accuracy of the analog bitline current sense and ADC is limited, due to limited voltage range, noise, and PVT variations
  • the write cycle time for the ReRAM array is long
  • the endurance of the ReRAM array severely limits the applicability as a general memory storage array

These issues all lead to the same conclusion.  For a relatively small inference neural network, where all the weights can be loaded in the memory array, and the data vector representation is limited – e.g., 8 bits or less – a ReRAM-based implementation will offer area benefits.

However, for a machine learning application requiring a network larger than stored in the array and/or a workload requiring reconfigurability, updating weight values frequently precludes the use of a ReRAM current sense approach.  The same issue applies where the data precision requirements are high, necessitating a larger input vector.

An alternative for an in-memory computing architecture is to utilize an enhanced SRAM array to support (data * weight) computation, rather than a novel memory technology.  This allows a much richer set of machine learning networks to be supported.  If the number of layers is large, the input and weight values can be loaded into the SRAM array for node computation, output values saved, and subsequent layer values retrieved.  The energy dissipation associated with the data and weight transfers is reduced over a general-purpose computing solution, and the issue with ReRAM endurance is eliminated.

In-Memory Computing using an Extended SRAM Design

At the recent ISSCC, researchers from TSMC presented a modified digital-based SRAM design for in-memory computing, supporting larger neural networks.[2]

The figure above illustrates the extended SRAM array configuration used by TSMC for their test vehicle – a slice of the array is circled.  Each slice has 256 data inputs, which connect to the ‘X’ logic (more on this logic shortly).  Consecutive bits of the data input vector are provided in successive clock cycles to the ‘X’ gate.  Each slice stores 256 4-bit weight segments, one weight nibble per data input;  these weights bits use conventional SRAM cells, as they could be updated frequently.  The value stored in each weight bit connects to the other input of the ‘X’ logic.

The figure below illustrates how this logic is integrated into the SRAM.

The ‘X’ is a 2-input NOR gate, with a data input and a weight bit as inputs.  (The multiplicative product of two one-bit values is realized by an AND gate;  by using inverted signal values and DeMorgan’s Theorem, the 2-input NOR gate is both area- and power-efficient.)  Between each slice, an adder tree plus partial sum accumulator logic is integrated, as illustrated below.

Note that the weight bit storage in the figure above uses a conventional SRAM topology – the weight bit word lines and bit lines are connected as usual, for a 6T bitcell.  The stored value at each cell fans out to one input of the NOR gate.

The output of each slice represents a partial product and sum for a nibble of each weight vector.  Additional logic outside the extended array provides shift-and-add computations, to enable wider weight value representations.  For example, a (signed or unsigned integer) 16-bit weight would combine the accumulator results from four slices.

Testsite results

A micrograph of the TSMC all-digital SRAM-based test vehicle is shown below, highlighting the 256-input, 16 slice (4-bit weight nibble) macro design.

Note that one of the key specifications for the SRAM-based Compute-in-Memory macro is the efficiency with which new weights can be updated in the array.

The measured performance (TOPS) and power efficiency (TOPS/W) versus supply voltage are illustrated below.   Note that the use of a digital logic-based MAC provides functionality over a wide range of supply voltage.

(Parenthetically, the TOPS/W figure-of-merit commonly used to describe the power efficiency of a neural network implementation can be a misleading measure – it is strongly dependent upon the “density” of the weights in the array, and the toggle rate of the data inputs.  There is also a figure below that illustrates how this measure depends upon the input toggle rate, assuming a 50% ratio of ‘1’ values in the weight vectors.)

Although this in-memory computing testsite was fabricated in an older 22nm process, the TSMC researchers provided preliminary area and power efficiency estimates when extending this design to the 5nm node.

Summary

There is a great deal of research activity underway to support in-memory computing for machine learning, to reduce the inefficiencies of data transfer in von Neumann architectures.  One facet of the research is seeking to use new memory storage technology, such as ReRAM.  The limited endurance of ReRAM limits the scope of this approach to applications where weight values will not be updated frequently.  The limited accuracy of bitline current sense also constrains the data input vector width.

TSMC has demonstrated how a conventional SRAM array could be extended to support in-memory computing, for large and/or reconfigurable networks, with frequent writes of weight values.  The insertion of 2-input NOR gates and adder tree logic among the SRAM rows and columns provides an area- and power-efficient approach.

-chipguy

 

References

[1]  https://energyestimation.mit.edu

[2]  Chih, Yu-Der, et al., “An 89TOPS/W and 16.3TOPS/mm**2 All-Digital SRAM-Based Full-Precision Compute-in-Memory Macro in 22nm for Machine-Learning Applications”, ISSCC 2021, paper 16.4.