Is IBM’s 2nm Announcement Actually a 2nm Node?

Is IBM’s 2nm Announcement Actually a 2nm Node?
by Scotten Jones on 05-09-2021 at 6:00 am

Slide1

IBM has announced the development of a 2nm process.

IBM Announcement

What was announced:

  • “2nm”
  • 50 billion transistors in a “thumbnail” sized area later disclosed to be 150mm2 = 333 million transistors per millimeter (MTx/mm2).
  • 44nm Contacted Poly Pitch (CPP) with 12nm gate length.
  • Gate All Around (GAA), there are several ways to do GAA, based on the cross sections IBM is using horizontal nanosheets (HNS).
  • The HNS stack is built over an oxide layer.
  • 45% higher performance or 75% lower power versus the most advanced 7nm chips.
  • EUV patterning is used in the front end and allows the HNS sheet width to be varied between 15nm to 70nm. This is very useful to tune various areas of the circuit for low power or high performance and also for SRAM cells.
  • The sheets are 5nm thick and stacked three high.

Is this really “2nm” as claimed by IBM? The current leader in production process technology is TSMC. We have plotted TSMC node names versus transistor density and fitted a curve with a 0.99 R2 value, see figure 1.

Figure 1. TSMC Equivalent Nodes.

Using the curve fit we can convert transistor density to a TSMC Equivalent Node (TEN). Using curve fit we get a TEN of 2.9nm for the IBM announced 333MTx/mm2. In our opinion this makes the announcement a 3nm node, not a 2nm node.

To compare the IBM announcement in more detail to previously announced 3nm processes and projected 2nm processes we need to make some estimates.

  • We know the CPP is 44nm from the announcement.
  • We are assuming a Single Diffusion Break (SDB) that would result in the densest process.
  • Looking at the cross section that was in the announcement, we do not see Buried Power Rails (BPR), BPR is required to reduce HNS track height down to 5.0, so we assume 6.0 for the process.
  • To get to 333MTx/mm2 the Minimum Metal Pitch must be 18nm, a very aggressive value likely requiring EUV multipatterning.

IBM 2nm Versus Foundry 3nm

Figure 2 compares the IBM 2nm devise to our estimates for Samsung and TSMC 3nm processes. We know Samsung is also doing a HNS and TSMC is staying with a FinFET at 3nm. Samsung and TSMC have both announced density improvements for their 3nm processes versus their 5nm processes so we have known transistor density for all three companies and can compute TEN for all three. As previously noted, IBM’s TEN is 2.9, we now see Samsung’s TEN is 4.7 and TSMC’s TEN is 3.0 again reinforcing that IBM 2nm is like TSMC 3nm and Samsung is lagging TSMC.

The numbers in red in figure 2 are estimated to achieve the announced densities, We assume SDB for all companies. TSMC has the smallest track height because a FinFET can have a 5.0 track height without BPR, but HNS needs BPR to reach 5.0 in BPR isn’t ready yet.

Figure 2. IBM 2nm Versus Foundry 3nm.

IBM 2nm Versus Foundry 2nm

We have also projected Samsung and TSMC 2nm processes in figure 3. We are projecting that both companies will use BPR (BPR is not ready yet but likely will be when Samsung and TSMC introduce 2nm around 2023/2024). We also assume that Samsung and TSMC will utilize a forksheet NHS (HNS (FS) architecture to reach a 4.33 track height relaxing some of the other shrink requirements. We have then projected out CPP and MMP based on the company’s recent shrink trends.

Figure 3. IBM 2nm Versus Foundry 2nm.

 Power and Performance

At ISS this year I estimated relative power and performance for Samsung and TSMC by node with some additional Intel performance data. The trend by node is based on the companies announced power and performance scaling estimates versus available comparisons at 14nm/16nm. For more information see the ISS article here.

Since IBM compared their power and performance improvements to leading 7nm performance I can place the IBM power and performance on the same trend plots I previously presented, see figure 4.

Figure 4. Power and Performance (estimates).

 IBM’s use of HNS yields a significant reduction in power and makes their 2nm process more power efficient than Samsung or TSMC’s 3nm process, although we believe once TSMC adopts HNS at 2nm they will be as good or better than IBM for power. For performance we estimate that TSMC’s 3nm process will outperform the IBM 2nm process.

As discussed in the ISS article these trends are only estimates and are based on a lot of assumptions but are the best projections we can put together.

Conclusion

After analyzing the IBM announcement, we believe their “2nm” process is more like a 3nm TSMC process from a density perspective with better power but inferior performance. The IBM announcement is impressive but is a research device that only has a clear benefit versus TSMC’s 3nm process for power and TSMC 3nm will be in risk starts later this year and production next year.

We further believe that TSMC will have the leadership position in density, power, and performance at 2nm when their process enters production around 2023/2024.

Also Read:

Ireland – A Model for the US on Technology

How to Spend $100 Billion Dollars in Three Years

SPIE 2021 – Applied Materials – DRAM Scaling


You know you have a problem when 60 Minutes covers it!

You know you have a problem when 60 Minutes covers it!
by Robert Maire on 05-03-2021 at 2:00 pm

60 Minutes Chip Shortage

-Chip shortage on 60 Minutes- Average Joe now aware of chip issue
-Intel sprinkling fairy dust (money) on New Mexico & Israel
-Give up on buy backs and dividends
-Could Uncle Sam give a handout to Intel?

You normally don’t want to answer the door if 60 Minutes TV crew is outside as it likely doesn’t mean good things. But in the case of the chip industry, the shortage that has been talked about in all media outlets has finally come home to prime time.

The chip shortage that has impacted industries across the board from autos to appliances to cigarettes so it has gotten prime time attention;

CBS 60 Minutes program on Chip Shortage

60 Minutes got hold of some of our past articles including our recent ones about the shortage and China risks and contacted us.

We gave them a lot of background information and answered questions about the industry and shortages as we wanted to help provide an accurate picture.

Overall, we think they did a great job representing what was going on in the industry and were both accurate and informing.

Does Intel have its hand out?

We have previously mentioned that we thought that Intel was looking for government help and maybe a handout which was touched upon in Pat Gelsinger’s interview , up front. While certainly not directly asking for money, it certainly sounds like Intel wouldn’t say no. Intel was clearly shopping the idea under the previous administration in the White House as well as previous Intel management.

The chip shortage both amplifies that prior request as well as makes it more timely. It also gets even more timely when it is put under the banner of infrastructure repair.

Intel is going to hemorrhage Money

We have said that Intel’s financial’s were going to get a lot worse before they got any better.

We suggested they would triple spend 1) Spend to have TSMC make product 2) Spend to catch up to TSMC (like on EUV and other tools) 3) spend to build extra capacity to become a foundry.

Intel, Gelsinger, even said on 60 minutes that they are not going to be doing stock buy backs.

Intel in Israel & New Mexico

Intel has just announced that in addition the the $20B for two new foundries that it is spending in Arizona, it is spending $3.5B in New Mexico on packaging technology & capacity.

Intel is also spending $200M on a campus in Haifa, $400M for Mobileye in Israel and $10B to expand its 10NM fab in Kiryat Gat, Israel . Its interesting to note that the spend in Israel is not mentioned on Intel’s newsroom website as it likely doesn’t fit the “move technology & jobs back to the US” that Gelsinger espoused on 60 Minutes.

Between spending on production at TSMC, fixing Intel, building foundries, New Mexico, Mobileye, Israel (likely Ireland as well)…Intel is going to be raining down money all over.

Mark Liu on 60 minutes

Mark Liu was also interviewed as the clear leader in technology and capacity in the chip industry. We think that Liu was very accurate and straight forward when he said that TSMC was surprised that Intel had fumbled.

He also clearly is on the side of the industry that downplays the shortages and thinks they will be short lived.

As to the “repatriation” of the chip industry to the US, as expected he sees no reason for it.

He also stayed away from commentary about the “Silicon Shield” provided to Taiwan by its leadership in chips.

TSMC is clearly in the drivers seat and is not likely to change any time soon

The Stocks

Given the spending and gargantuan task ahead we have suggesting avoiding Intel’s stock as its going to both take longer and cost more than anyone suggests and the odds of success aren’t great.

Gelsinger is on a world tour sprinkling fairy dust around which he will need the luck of as we go forward.

We would not be surprised if the government does indeed write Intel a check as they are the US’s only and last hope of getting back in the semiconductor game which is so critical to our future, not to mention our short term needs.

All this spend will do zero to help the shortage but the shortage did at least bring these issues (many of which we have been talking about for years) to the forefront of peoples minds.

We do continue to think that the semi equipment industry will likely benefit big time especially ASML as they have a lock on EUV.

We also think equipment companies can make a few bucks on their old 6″ and 8″ tools if they can resurrect manufacturing as those are the fabs in shortest supply.

Also Read:

KLAC- Great QTR & Guide- Foundry/logic focus driver- Confirms $75B capex in 2021

Lam Research performing like a Lion – Chip equip on steroids

ASML early signs of an order Tsunami – Managing the ramp


How to Spend $100 Billion Dollars in Three Years

How to Spend $100 Billion Dollars in Three Years
by Scotten Jones on 04-25-2021 at 6:00 am

Slide1 1

TSMC recently announced plans to spend $100 billion dollars over three years on capital. For 2021 they announced $30B in total capital with 80% on advanced nodes (7nm and smaller), 10% on packaging and masks and 10% on “specialty”.

If we take a guess at the capital for each year, we can project something like $30B for 2021 (announced), $33.5B for 2022 and $36.5B for 2023. $30B + $33.5B + $36.5B = $100B. The exact breakout by year for 2022 and 2023 may be different than this but overall, the numbers work. If we further assume that the 80% spending on advanced node ratio will be maintained over the three years, we get: $24B for 2021, $26.8B for 2022 and $29.2B for 2023 ($80B total).

What kind of advanced capabilities can you buy for $80B over 3 years?

Figure 1 illustrates our view of TSMC’s advanced node plans.

Figure 1. TSMC Advanced Node Plans.

To begin 2021, TSMC had record 7nm revenue in Q1 and we believe they needed to add 25K wafer per month (wpm) of capacity to do that, whether that spending was in 2021 or late 2020 is subject to debate. 5nm was in production beginning in the second half of 2020 and we believe a farther ramp up of 60k wpm will take place in 2021 reaching 120k wpm by year end. Also, in late 2021 will be 3nm risk starts requiring the completion of one cleanroom phase and an estimated 15k wpm of 3nm capacity.

2022 will see the ramp up of 3nm with an additional 60K wpm of capacity.

2023 Will see the build out of 5nm capacity at the Arizona fab, and an additional 45k wpm of 3nm capacity. Finally, we expect 2nm risk starts in 2023 requiring a cleanroom build out and 15k wpm. Where 5nm and 3nm are being produced in 3 cleanroom phases each, TSMC has announced that 2nm will be built in four cleanroom phases and we have planned on two phase in 2023.

Figure 2 illustrates our view of TSMC’s capital spending by node for 7nm, 5nm, 3nm and 2nm.

Figure 2. TSMC Capital Spending on Advanced Nodes.

In 2021 we have $4.6B for 7nm capacity, $15.2B for additional 5nm capacity and $6.4B for the initial 3nm cleanroom and risk starts capability. The total $26.3B is more than the calculated $24B so some of the 7nm capacity may be in 2020 or some of the 3nm spending may be in 2022.

In 2022 we have $23.2B for additional 3nm capacity, this is less than the $26.8B expected for 2022. Because 2023 is expected to have spending in Arizona, more 3nm capacity and the initial 2nm build out it is possible 2022 may see less capital spending than we initially assumed and 2023 more capital spending.

For 2023 we have the first 5nm phase built out in Arizona for $5.7B, additional 3nm capacity for $15.4B and the initial build out of 2nm for $9.3B. The total for 2023 is $33.5B, more than the estimated $29.2B.

If we add up our forecast over three years, we get $79.8B versus the $80B estimate assuming 80% of the announced $100B is spent on advanced nodes. We should also keep in mind that the $100B is a three-year estimate subject to changing market conditions.

In this scenario, in 2023 TSMC will have 140k wpm of 5nm production capacity, 120k wpm of 3nm production capacity and 15k wpm of 2nm risk start capacity.

Also Read:

SPIE 2021 – Applied Materials – DRAM Scaling

Kioxia and Western Digital and the current Kioxia IPO/Sale rumors

Intel Node Names


TSMC Ups CAPEX Again!

TSMC Ups CAPEX Again!
by Daniel Nenni on 04-16-2021 at 6:00 am

TSMC 1Q21 Revenue by Platform

We were all pleasantly surprised when TSMC increased 2021 Capex to a record $28 billion. To me this validated the talk inside the ecosystem that Intel would be coming to TSMC at 3nm. We were again surprised when TSMC announced a $100B investment over the next three years which belittled Intel’s announcement that they would spend $20B on two new fabs in Arizona.

It wasn’t clear what the TSMC investment included but we now know (via the Q1 2021 Investor Call) that it’s predominantly CAPEX starting with $30B in 2021 and the rest over 2022 and 2023. Personally, I think TSMC CAPEX will end up being more than $100B because TSMC tends to be conservative with their numbers, absolutely.

Let’s take a look at CC Wei’s opening statement on yesterday’s investor call:

CC Wei First, let me talk about the capacity shortage and demand outlook. Our customers are currently facing challenges from the industry-wide semiconductor capacity shortage, which is driven by both a structural increase in long-term demand as well as short-term imbalance in the supply chain. We are witnessing a structural increase in underlying semiconductor demand as a multi-year megatrend of 5G and HPC-related applications are expected to fuel strong demand for our advanced technologies in the next several years. COVID-19 has also fundamentally accelerate the digital transformation, making semiconductors more pervasive and essential in people’s life.

D.A.N. The short term imbalance is of course the drop in utilization last year due to the uncertainty brought by the pandemic and now the hockey stick shape rebound which includes some panic buying. The bottom line is that we have enough capacity today and more than enough capacity coming tomorrow so no worries here.

CC Wei: To address the structural increase in the long-term demand profile, we are working closely with our customers and investing to support their demand. We have acquired land and equipment and started the construction of new facilities. We are hiring thousands of employees and expanding our capacity at multiple sites. TSMC expects to invest about USD 100 billion through the next 3 years to increase capacity, to support the manufacturing and R&D of leading-edge and specialty technologies. Increased capacity is expected to improve supply certainty for our customers and help strengthen confidence in global supply chains that rely on semiconductors.

D.A.N. Based on what we have seen on the SemiWiki job board TSMC is indeed hiring thousands of employees and the TSMC job posts are getting 2x more views than average. And yes TSMC is already spending that $100B, $8.8B was consumed in Q1 2021.

CC Wei:  Our capital investment decisions are based on 4 disciplines: technology leadership, flexible and responsive manufacturing, retaining customers’ trust and earning the proper return. At the same time, we face manufacturing cost challenges due to increasing process complexity at leading node, new investment in mature nodes and rising material costs. Therefore, we will continue to work closely with customers to sell our value. Our value includes the value of our technology, the value of our service and the value of our capacity support to customers. We will look to firm up our wafer pricing to a reasonable level.

D.A.N. Translation: there will be pricing adjustments to compensate for the added capacity.

CC Wei:  Next, let me talk about the automotive supply update. The automotive market has been soft since 2018. Entering 2020, COVID-19 further impact the automotive market. The automotive supply chain was affected throughout the year, and our customers continued to reduce their demand throughout the third quarter of 2020. We only began to see sudden recovery in the fourth quarter of 2020.

However, the automotive supply chain is long and complex with its own inventory management practices. From chip production to car production, it takes at least 6 months with several tiers of suppliers in between. TSMC is doing its part to address the chip supply challenges for our customers.

D.A.N. Some car companies have shortages and some don’t, it all depends on inventory and who cut orders in 2020. Toyota I’m told has the best managed inventory and is still making cars. Other car companies not so much.

CC Wei: Finally, I will talk about the N5 and N3 status. TSMC’s N5 is the foundry industry’s most advanced solution with the best PPA. N5 is already in its second year of volume production with yield better than our original plan. N5 demand continue to be strong, driven by smartphone and HPC applications, and we expect N5 to contribute around 20% of our wafer revenue in 2021.

D.A.N. I was told by a gaming chip leaker that there is panic buying in crypto and gaming which may explain TSMC’s big HPC numbers. Also, the word inside the ecosystem is that Samsung is having problems so there is a burst of 5N and 3N design activity. In fact, 80% of the 2021 CAPEX is being spent on 5N and 3N (which are pretty much identical fabs using different process recipes).

CC Wei: N3 will be another full node stride from our N5 and will use FinFET transistor structure to deliver the best technology maturity, performance, and cost for our customers. Our N3 technology development is on track with good progress. We continue to see a much higher level of customer engagement for both HPC and smartphone applications at N3 as compared with N5 and N3 at a similar stage.

D.A.N. This is due to Samsung’s failure at 3nm. Scotten Jones did a nice blog on this earlier this year:

ISS 2021 – Scotten W. Jones – Logic Leadership in the PPAC era

CC Wei: Risk production is scheduled in 2021. The volume production is targeted in second half of 2022. Our 3-nanometer technology will be the most advanced foundry technology in both PPA and transistor technology. Thus, we are confident that both our 5-nanometer and 3-nanometer will be large and long-lasting nodes for TSMC.

D.A.N. Apple iProducts will be 3N next year which means HVM in 2H 2022. The IDM foundries (Intel and Samsung) do initial product introductions and spend a year or two ramping up to HVM so it is hard to compare new process introduction dates.

You can join a more detailed discussion here in the experts forum: TSMC Q1 2021 Earnings Conference Call


Foundry Fantasy- Deja Vu or IDM 2?

Foundry Fantasy- Deja Vu or IDM 2?
by Robert Maire on 03-26-2021 at 8:00 am

Foundry Profit 2020

– Intel announced 2 new fabs & New Foundry Services
– Not only do they want to catch TSMC they want to beat them
– It’s a very, very tall order for a company that hasn’t executed
– It will require more than a makeover to get to IDM 2.0

Intel not only wants to catch TSMC but beat them at their own game

Intel announced that it was going to spend $20B on two new fabs in Arizona and establish Intel Foundry Services as part of re-imagining Intel into “IDM 2.0”. The stated goal would be to provide foundry services to customers much as TSMC does so well today.

This will not be easy. A lot of companies have died on that hill or been wounded. Global Foundries famously gave up. Samsung still spends oodles of money trying to keep within some sort of distance to TSMC. UMC, SMIC and many others just don’t hold a candle to TSMC’s capabilities and track record.

This all obviously creates a very strange dynamic where Intel is highly dependent upon TSMC’s production for the next several years but then thinks it can not only wean itself off of TSMC’s warm embrace but produce enough for itself as well as other customers to be a real foundry player.

If Pat Gelsinger can pull this off he deserves a billion dollar bonus

This goes beyond doubling down on Intel’s manufacturing and well into a Hail Mary type of play. This may turn out to be an aspirational type of goal in which everyone would be overjoyed if they just caught back up to TSMC.

Like Yogi Berra said “It’s Deja Vu all over again”- Foundry Services 2.0

Lest anyone conveniently forget, Intel tried this Foundry thing before and failed, badly. It just didn’t work. They were not at all competitive.

It could be that we are just past the point of remembering that it was a mistake and have forgotten long enough to try again.

We would admit that Intel’s prior attempt at being a foundry services provider seemed almost half hearted at best. We sometimes thought that many long time Intel insiders previously snickered at being a foundry as they somehow thought it beneath them.

Trying to “ride the wave” of chip shortage fever?

It could also be that Intel is trying to take advantage of the huge media buzz about the current chips shortage by playing into that theme, and claiming to have the solution.

We would remind investors that the current chip shortage that has everyone freaked out will be long over, done and fixed and a distant memory before the first brick is even laid for the two new fabs Intel announced today. But it does make for good timing and PR.

Could Intel be looking for a chunk of “Chips for America” money?

Although Intel said on the call that government funding had nothing to do with whether or not they did the project we are certain that Intel will have its hand out and lobby big time to be the leader of Chips for America.

We would remind investors that the prior management of Intel was lobbying the prior White House administration hard to be put in charge of the “Chips for America” while at the exact same time negotiating to send more product (& jobs) to TSMC.

This is also obviously well timed as is the current shortage. Taken together the idea of Intel providing foundry services makes some sense on the surface at least.

Intel needs to start with a completely clean slate with funding

We think it may be best for Intel to start as if it never tried being a foundry before. Don’t keep any of the prior participants as it didn’t work before.
Randhir Thakur has been tasked with running Intel Foundry Services. We would hope that enough resources are aimed at the foundry undertaking to make it successful. It needs to stand alone and apart.

Intel’s needs different “DNA” in foundry- two different companies in one

The DNA of a Foundry provider is completely different than that of being an IDM. They both do make chips but the similarity stops there.

The customer and customer mindset is completely different. Even the technology is significantly different from the design of the chips, to the process flows in the fabs to package and test. The design tools are different, the manufacturing tools are different and so is packaging and test equipment.

While there is a lot of synergy between being a fab and an IDM it would be best to run this as two different companies under one corporate roof. It’s going to be very difficult to share: Who gets priority? Who’s needs come first? One of the reason’s Intel’s foundry previously failed was the the main Intel seemed to take priority over foundry and customers will not like the obvious conflict which has to be managed.

Maybe Intel should hire a bunch of TSMC people

Much as SMIC hired a bunch of TSMC people when it first started out, maybe Intel would be well served to hire some people from TSMC to get a jump start on how to properly become a real foundry. It would be poetic justice of a US company copying an Asian company that made its bones copying US companies in the chip business.

We have heard rumor that TSMC is offering employees double pay to move from Taiwan to Arizona to start up their new fab there. Perhaps Intel should offer to triple pay TSMC employees to move and jump ship. It would be worth their while. Intel desperately needs the help.

Pat Gelsinger is bringing back a lot of old hands from prior years at Intel as well as others in the industry (including a recent hire from AMAT) but Intel needs people experienced in running a foundry and dealing with foundry customers. Intel has to hire a lot of new and experienced people because they not only need people to catch up their internal capacity, which is not easy, and it needs more people to become a foundry company and the skillsets, like the technology are completely different. This is not going to be either cheap or easy.

I don’t get the IBM “Partnership”

IBM hasn’t been a significant, real player in semiconductors in a very, very long time. It may have a bunch of old patents but it has no significant current process technology that is of true value. It certainly doesn’t build current leading edge or anything close nor does it bring anything to the foundry party.
Its not like IBM helped GloFo a lot. They brought nothing to the table. GloFo still failed in the Moore’s law race. In our view IBM could be a net negative as Intel has to “think different” to be two companies in one, it needs to re-invent itself.

The IBM “partnership” is just more PR “fluff” just like the plug from Microsoft and quotes from tech leaders in the industry that accompanied the press release. Its nonsense.

Don’t go out and buy semi equipment stocks based on Intel’s announcements

Investors need to stop and think how long its going to be before Intel starts ordering equipment for the two $10B fabs announced. Its going to be years and years away.

The buildings have to be designed, then built before equipment can even be ordered. Maybe if we are lucky the first shovel goes in the ground at the end of 2021 and equipment starts to roll in in 2023…maybe beginning production at reasonable scale by 2025 if lucky. Zero impact on current shortage – Even though Intel uses the current shortage as excuse to restart foundry

The announcement has zero, none, nada impact on the current shortage for two significant reasons;

First, as we have just indicated it will be years before these fabs come on line let alone are impactful in terms of capacity. The shortages will be made up for by TSMC, Samsung, SMIC, GloFo and others in the near term. The shortages will be ancient history by the time Intel gets the fabs on line.

Second, as we have previously reported, the vast majority of the shortages are at middle of the road or trailing edge capacity made in 10-20 years old fabs on old 8 inch equipment. You don’t make 25 cent microcontrollers for anti-lock brakes in bleeding edge 7NM $10B fabs, the math doesn’t work. So the excuse of getting into the foundry business because of the current shortage just doesn’t fly, even though management pointed to it on the call.

Could Intel get Apple back?

As we have said before, if we were Tim Apple, a supply chain expert, and the entire being of our company was based on Taiwan and China we might be a little nervous. We also might push our BFF TSMC to build a gigafab in the US to secure capacity. The next best thing might be for someone else like Intel or Samsung to build a gigafab foundry in the US that I could use and go back to two foundry suppliers fighting for my business with diverse locations.

The real reason Intel needs to be a foundry is the demise of X86

Intel has rightly figured out that the X86 architecture is on a downward spiral. Everybody wants their own custom ARM, AI, ML, RISC, Tensor, or what ever silicon chip. No one wants to buy off the rack anymore they all want their own bespoke silicon design to differentiate the Amazons from the Facebooks from the Googles.

Pat has rightly figured out that its all about manufacturing. Just like it always was at Intel and something TSMC never stopped believing. Yes, design does still matter but everybody can design their own chip these days but almost no one, except TSMC, can build them all.

Either Intel will have to start printing money or profits will suffer near term

We have been saying that Intel is going to be in a tight financial squeeze as they were going to have reduced gross margins by increasing outsourcing to TSMC while at the same time re-building their manufacturing, essentially having a period of almost double costs (or at least very elevated costs).

The problem just got even worse as Intel is now stuck with “triple spending”. Spending (or gross margins loss) on TSMC, re-building their own fabs and now a third cost of building additional foundry capacity for outside customers.
We don’t see how Intel avoids a financial hit.

Its not even sure that Intel can spend enough to catch up let alone build foundry capacity even if it has the cash

We would point out that TSMC has the EUV ASML scanner market virtually tied up for itself. They have more EUV scanners than the rest of the world put together.

Intel has been a distant third after Samsung in EUV efforts. If Intel wants to get cranking on 7NM and 5NM and beyond it has a lot of EUV to buy. It can’t multi-pattern its way out of it. Add on top of that a lot of EUV buying to become a foundry player as the PDKs for foundry process rely a lot less on the tricks that Intel can pull on its own in house design and process to avoid EUV. TSMC and foundry flows are a lot more EUV friendly.

As we have previously pointed out the supply of EUV scanners can’t be turned on like a light switch, they are like a 15 year old single malt, it takes a very long time to ramp up capacity, especially lenses which are a critical component.
I don’t know if Intel has done the math or called their friends at ASML to see if enough tools are available. ASML will likely start building now to be ready to handle Intel’s needs a few years from now if Intel is serious.

Being a foundry is even harder now

Intel was asked on the call “what’s different this time” in terms of why foundry will work now when it didn’t years ago and their answer was that foundry is a lot different now.

We would certainly agree and suggest that being a leading edge foundry is even much more difficult now. Its far beyond just spending money and understanding technology. Its mindset and process. Its not making mistakes. To underscore both TSMC and Pat Gelsinger its “execution, execution & execution” We couldn’t agree more. Pat certainly “gets it” the question is can he execute?

The tough road just became a lot tougher

Intel had a pretty tough road in front of it to catch the TSMC juggernaut. The road just got a lot more difficult to both catch them and beat them at their own game, that’s twice as hard.

However we think that Pat Gelsinger has the right idea. Intel can’t just go back to being the technology leader it was 10 or 20 years ago, it has to re-invent itself as a foundry because that is what the market wants today (Apple told them so).

It’s not just fixing the technology , it’s fixing the business model as well, to the new market reality.

It’s going to be very, very tough and challenging but we think that Intel is up for it. They have the strategy right and that is a great and important start.

All they have to do is execute….

Related:

Intel Will Again Compete With TSMC by Daniel Nenni 

Intel’s IDM 2.0 by Scotten Jones 

Intel Takes Another Shot at the Enticing Foundry Market by Terry Daly


Resistive RAM (ReRAM) Computing-in-Memory IP Macro for Machine Learning

Resistive RAM (ReRAM) Computing-in-Memory IP Macro for Machine Learning
by Tom Dillinger on 03-18-2021 at 6:00 am

testsite

The term von Neumann bottleneck is used to denote the issue with the efficiency of the architecture that separates computational resources from data memory.   The transfer of data from memory to the CPU contributes substantially to the latency, and dissipates a significant percentage of the overall energy associated with the computation.

This energy inefficiency is especially acute for the implementation of machine learning algorithms using neural networks.  There is a significant research emphasis on in-memory computing, where hardware is added to the memory array in support of repetitive, vector-based data computations, reducing the latency and dissipation of data transfer to/from the memory.

In-memory computing is well-suited for machine learning inference applications.  After the neural network is trained, the weights associated with the multiply-accumulate (MAC) operations at each network node are stored in the memory, and can be used directly as multiplication operands.

At the recent International Solid-State Circuits Conference (ISSCC), researchers from the National Tsing Hua University and TSMC presented several novel design implementation approaches toward in-memory computing, using resistive RAM (ReRAM). [1]  Their techniques will likely help pave the way toward more efficient AI implementations, especially at the edge where latency and power dissipation are key criteria.

Background

An example of a fully-connected neural network is shown in the figure below.

A set of input data (from each sample) is presented to the network – the input layer.  A series of computations is performed at each subsequent layer.  In the fully-connected network illustrated above, the output computation from each node is presented to all nodes in the next layer.  The final layer of the trained network is often associated with determining a classification match to the input data, from a fixed set of labeled candidates (“supervised learning”).

The typical computation performed at each node is depicted below.  Each data value is multiplied by its related (trained) weight constant, then summed – a multiply-accumulate (MAC) calculation.  A final (trained) bias value may be added.  The output of a numeric activation function is used to provide the node output to the next layer.

The efficiency of the node computation depends strongly on the MAC operation.  In-memory computing architectures attempt to eliminate the delay and power dissipation of transferring weight values for the MAC computation.

The figures above illustrate how the multiplication of (data * weight) could be implemented using the value stored in a one-transistor, one-resistor (1T1R) ReRAM bitcell. [2]

ReRAM technology offers a unique method for non-volatile storage in a memory array.  A write cycle to the bitcell may change the property of the ReRAM material, between a high-resistance (HR) and low-resistance (LR) state.  Subsequent to the write cycle, a bitline current-sense read cycle differentiates between the resistance values to determine the stored bit.

Again referring to the figure above, with the assumption that HR = ‘0’ and LR = ‘1’, the ReRAM cell implements the (data * weight) product in the following manner:

  • if the data = ‘0’, the word line to the bitcell is inactive and little bitline current flows
  • if the data = ‘1’ (word line active), their bitcell current will either be iHR or iLR

If the bitline current sense circuitry distinguishes between iHR (small) and iLR (large), only the product (data = ‘1’) * (weight = ‘1’) = ‘1’ results in significant bitline current.

The summation of the (data * weight) product for multiple data values into the fully-connected network node is illustrated in the figure above.  Unlike a conventional memory array where only one decoded address word line is active, the in-memory computing MAC will have an active word line for each node input where (data = ‘1’).  The total bitline current will be the sum of the parallel ‘dotted’ bitcell currents where the individual word lines are active, either iLR or iHR for each.  The multiply-accumulate operation for all (data * weights) is readily represented as the total bitline current.

At the start of the MAC operation, assume a capacitor connected to the bitline is set to a reference voltage (say, either fully pre-charged or discharged).  The clocked duration of the MAC computation will convert the specific bitline current in that clock cycle into a voltage difference on that capacitor:

delta_V = (I_bitline) * (delta_T) / Creference

That voltage can be read by an analog-to-digital converter (ADC), to provide the digital equivalent of the MAC summation.

In-Computing ReRAM Innovations

The ISSCC presentation from researchers at National Tsing Hua University and TSMC introduced several unique innovations to the challenges of ReRAM-based in-memory computing.

Data and Weight Vector Widths

The simple examples in the figures above used a one-bit data input and a one-bit weight.  A real edge AI implementation will have data vector and weight vector widths as input to the MAC operation.  For example, consider the case of 8-bit data and 8-bit weights for each multiplication product in the MAC operation.  (Parenthetically, the vector width of the weights after network training need not be the same of the input data vector width.  Further, the numeric value of the width vector could be any of a number of representations – e.g., signed or unsigned integer, twos complement.)  For the example, at each network node, the in-memory computation architecture needs to compute multiple products of two 8-bit vectors and accumulate the sum.

While the ReRAM array macro computes the MAC for the network node, circuitry outside the array would be used to add the bias, and apply the activation function.  This function would also normalize the width of the node output result to the input data vector width for the next network layer.

The researchers implemented a novel approach toward the MAC calculation, expanding upon the 1-bit ReRAM example shown above.

The description above indicated that the duration of the bitline current defines the output voltage on the reference capacitor.

The researchers reviewed several previous proposals for generating the data vector input-to-word line duration conversion, as illustrated below.

The input data value could be decoded into a corresponding number of individual word line pulses, as illustrated below.

Alternatively, the data value could be decoded into a word line pulse of different durations.  The multiplication of the data input vector times each bit of the weight could be represented by different durations of the active word line to the ReRAM bit cell, resulting in different cumulative values of bitline current during the read cycle.  The figure below illustrates the concept, for four 3-bit data inputs applied as word lines to a weight vector bitline, shown over two clock cycles.

For a data value of ‘000’, the word line would remain off;  for a data value of ‘111’, the maximum word line decode pulse duration would be applied.  The data input arcs to the network node would be dotted together as multiple active cells on the column bitline, as before.

Each column in the ReRAM array corresponds to one bit of the weight vector – the resulting voltage on the reference capacitor is the sum of all node data inputs times one bit of the weight.

Outside of the ReRAM array itself, support circuitry is provided to complete the binary vector (data*weight) multiplication and accumulation operation:

  •  an ADC on each bitline column converts the voltage value to a binary vector
  • shifting the individual binary values for the MSB to LSB of the weight vector
  • generating the final MAC summation of the shifted weight bits

The researchers noted that these two approaches do not scale well to larger data vector widths:

  • the throughput is reduced, as longer durations are needed
  • for the long pulse approach, PVT variations will result in jitter in the active word line duration, impacting the accuracy

The researchers chose to implement a novel, segmented duration approach.  For example, an 8-bit data input vector is divided into 3 separate ReRAM operations, of 2-3-3 bits each.  The cumulative duration of these three phases is less than the full data decode approach, improving the computation throughput.

Scaling the Bitline Current

With the segmented approach, the researchers described two implementation options:

  • at the end of each phase, the reference capacitor voltage is sensed by the ADC, then reset for the next phase;  the ADC output provides the data times weight bit product for the segmented data vector slice
  • the reference capacitor voltage could be held between phases, without a sample-and-reset sequence

In this second case, when transitioning from one data vector segment to the next, it is necessary to scale the capacitor current correspondingly.  If the remaining data vector width for the next segment phase is n bits, the capacitor current needs to be scaled by 1/(2**n).  The figure below provides a simplified view to how the researchers translated the bitline current in each phase into a scaled reference capacitor current.

A pFET current mirror circuit is used to generate a current into the reference capacitor;  the unique nature of a current mirror is by adjusting device sizes in the mirror branch, scaled values of the bitline current are generated.  Between the data vector segment phases, the capacitor voltage is held, and a different scaled mirror current branch is enabled.

For the in-memory ReRAM computing testsite, the researchers chose to use the full reference capacitor reset phase for the most significant bits segment, to provide the optimum accuracy, as required for the MSBs of the data input.  For the remaining LSBs of the data, the subsequent phases used the switched current mirror approach.

Process Variations

The researchers acknowledged that there are significant tolerances in the high and low resistance values of each ReRAM bitcell.  When using ReRAM as a simple memory array, there is sufficient margin between lowR and highR to adequately sense a stored ‘1’ and ‘0’.

However, as the in-memory computing requirements rely on accumulation of specific (dotted) bitcell currents, these variations are a greater issue.  The researchers chose to use an “averaging” approach – each stored weight bit value is copied across multiple ReRAM bitcells (e.g., # of copies = 4).  Although the figures above depict each data input vector as one ReRAM word line, multiple word lines connected each weight bit are used.

Testsite and FOM

TSMC fabricated an ReRAM testsite using this segmented data vector technique.  The specs are shown in the figure above.  The testsite provided programmability for different data vector widths and weight vector widths – e.g., 8b-8b-14b represents an eight bit data input, an eight bit weight, and a full MAC summation supporting a fourteen bit result at the network node.

The researchers defined a figure-of-merit for MAC calculations using in-memory computing:

        FOM = (energy_efficiency * data_vector_width * weight_vector_width * output_vector_width) / latency

(Energy efficiency is measured in TOPS/Watt;  the output vector width from the ReRAM array and support circuitry is prior to bias addition and activation/normalization.)

Summary

Edge AI implementations are hampered by the power and latency inefficiencies associated with the von Neumann bottleneck, which has sparked great interest in the field of in-memory computing approaches.  Read access to a ReRAM array storing weight values offers a unique opportunity to implement a binary product of data and weights.  Researchers at TSMC and National Tsing Hua University have implemented several novel approaches toward the use of ReRAM for the MAC computation at each neural network node, addressing how to efficiently work with wide data vectors, and manage ReRAM process variation.  I would encourage you to read their recent technical update provided at ISSCC.

-chipguy

References

[1]   Xue, Cheng-Xin, et al., “A 22nm 4Mb 8b-Precision ReRAM Computing-in-Memory Macro”, ISSCC 2021, paper 16.1.

[2]  Mao, M., et al., “Optimizing Latency, Energy, and Reliability of 1T1R ReRAM Through Cross-Layer Techniques”, IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2016, p. 352-363.

 


All-Digital In-Memory Computing

All-Digital In-Memory Computing
by Tom Dillinger on 03-15-2021 at 6:00 am

NOR gate

Research pursuing in-memory computing architectures is extremely active.  At the recent International Solid State Circuits conference (ISSCC 2021), multiple technical sessions were dedicated to novel memory array technologies to support the computational demand of machine learning algorithms.

The inefficiencies associated with moving data and weight values from memory to a processing unit, then storing intermediate results back to memory are great.  The information transfer not only adds to the computational latency, but the associated power dissipation is a major issue.  The “no value add” data movement is a significant percentage of the dissipated energy, potentially even greater than for the “value add” computation, as illustrated below. [1]  Note that the actual computational energy dissipation is a small fraction of the energy associated with data and weight transfer to the computation unit.  The goal of in-memory computing is to reduce these inefficiencies, especially critical for the implementation of machine learning inference systems at the edge.

The primary focus of in-memory computing for machine learning applications is to optimize the vector multiply-accumulate (MAC) operation associated with each neural network node.  The figure below illustrates the calculation for the (trained) network – the product of each data input times weight value is summed, then provided to a bias and activation function.

For a general network, the data and weights are typically multi-bit quantities.  The weight vector (for a trained, edge AI network) could use a signed, unsigned, or twos complement integer bit representation.  For in-memory computing, the final MAC output is realized by the addition of partial multiplication products.  The bit width of each (data * weight) arc into the node is well-defined – e.g., the product of 2 n-bit unsigned integers is covered by a 2n-bit vector.  Yet, the accumulation of (data * weight) products for all arcs into a highly-connected network could require significantly more bits to accurately represent the MAC result.

One area of emphasis of the in-memory computing research has been to implement a bitline current-sense measurement using resistive RAM (ReRAM) bitcells.  The product of the data input (as the active memory row wordline) and weight value stored in the ReRAM cell generates a distinguishable bitline current applied to charge a reference capacitance.  A subsequent analog-to-digital converter (ADC) translates this capacitor voltage into the equivalent binary value for subsequent MAC shift-add accumulation.  Although the ReRAM-based implementation of the (data * weight) product is area-efficient, it also has its drawbacks:

  • the accuracy of the analog bitline current sense and ADC is limited, due to limited voltage range, noise, and PVT variations
  • the write cycle time for the ReRAM array is long
  • the endurance of the ReRAM array severely limits the applicability as a general memory storage array

These issues all lead to the same conclusion.  For a relatively small inference neural network, where all the weights can be loaded in the memory array, and the data vector representation is limited – e.g., 8 bits or less – a ReRAM-based implementation will offer area benefits.

However, for a machine learning application requiring a network larger than stored in the array and/or a workload requiring reconfigurability, updating weight values frequently precludes the use of a ReRAM current sense approach.  The same issue applies where the data precision requirements are high, necessitating a larger input vector.

An alternative for an in-memory computing architecture is to utilize an enhanced SRAM array to support (data * weight) computation, rather than a novel memory technology.  This allows a much richer set of machine learning networks to be supported.  If the number of layers is large, the input and weight values can be loaded into the SRAM array for node computation, output values saved, and subsequent layer values retrieved.  The energy dissipation associated with the data and weight transfers is reduced over a general-purpose computing solution, and the issue with ReRAM endurance is eliminated.

In-Memory Computing using an Extended SRAM Design

At the recent ISSCC, researchers from TSMC presented a modified digital-based SRAM design for in-memory computing, supporting larger neural networks.[2]

The figure above illustrates the extended SRAM array configuration used by TSMC for their test vehicle – a slice of the array is circled.  Each slice has 256 data inputs, which connect to the ‘X’ logic (more on this logic shortly).  Consecutive bits of the data input vector are provided in successive clock cycles to the ‘X’ gate.  Each slice stores 256 4-bit weight segments, one weight nibble per data input;  these weights bits use conventional SRAM cells, as they could be updated frequently.  The value stored in each weight bit connects to the other input of the ‘X’ logic.

The figure below illustrates how this logic is integrated into the SRAM.

The ‘X’ is a 2-input NOR gate, with a data input and a weight bit as inputs.  (The multiplicative product of two one-bit values is realized by an AND gate;  by using inverted signal values and DeMorgan’s Theorem, the 2-input NOR gate is both area- and power-efficient.)  Between each slice, an adder tree plus partial sum accumulator logic is integrated, as illustrated below.

Note that the weight bit storage in the figure above uses a conventional SRAM topology – the weight bit word lines and bit lines are connected as usual, for a 6T bitcell.  The stored value at each cell fans out to one input of the NOR gate.

The output of each slice represents a partial product and sum for a nibble of each weight vector.  Additional logic outside the extended array provides shift-and-add computations, to enable wider weight value representations.  For example, a (signed or unsigned integer) 16-bit weight would combine the accumulator results from four slices.

Testsite results

A micrograph of the TSMC all-digital SRAM-based test vehicle is shown below, highlighting the 256-input, 16 slice (4-bit weight nibble) macro design.

Note that one of the key specifications for the SRAM-based Compute-in-Memory macro is the efficiency with which new weights can be updated in the array.

The measured performance (TOPS) and power efficiency (TOPS/W) versus supply voltage are illustrated below.   Note that the use of a digital logic-based MAC provides functionality over a wide range of supply voltage.

(Parenthetically, the TOPS/W figure-of-merit commonly used to describe the power efficiency of a neural network implementation can be a misleading measure – it is strongly dependent upon the “density” of the weights in the array, and the toggle rate of the data inputs.  There is also a figure below that illustrates how this measure depends upon the input toggle rate, assuming a 50% ratio of ‘1’ values in the weight vectors.)

Although this in-memory computing testsite was fabricated in an older 22nm process, the TSMC researchers provided preliminary area and power efficiency estimates when extending this design to the 5nm node.

Summary

There is a great deal of research activity underway to support in-memory computing for machine learning, to reduce the inefficiencies of data transfer in von Neumann architectures.  One facet of the research is seeking to use new memory storage technology, such as ReRAM.  The limited endurance of ReRAM limits the scope of this approach to applications where weight values will not be updated frequently.  The limited accuracy of bitline current sense also constrains the data input vector width.

TSMC has demonstrated how a conventional SRAM array could be extended to support in-memory computing, for large and/or reconfigurable networks, with frequent writes of weight values.  The insertion of 2-input NOR gates and adder tree logic among the SRAM rows and columns provides an area- and power-efficient approach.

-chipguy

 

References

[1]  https://energyestimation.mit.edu

[2]  Chih, Yu-Der, et al., “An 89TOPS/W and 16.3TOPS/mm**2 All-Digital SRAM-Based Full-Precision Compute-in-Memory Macro in 22nm for Machine-Learning Applications”, ISSCC 2021, paper 16.4.

 


Register File Design at the 5nm Node

Register File Design at the 5nm Node
by Tom Dillinger on 03-10-2021 at 2:00 pm

lowVt bitcell

“What are the tradeoffs when designing a register file?”  Engineering graduates pursuing a career in microelectronics might expect to be asked this question during a job interview.  (I was.)

On the surface, one might reply, “Well, a register file is just like any other memory array – address inputs, data inputs and outputs, read/write operation cycles.  Maybe some bit masking functionality to write a subset of the data inputs.  I’ll just use the SRAM compiler for the foundry technology.”  Alas, that answer will likely not receive any kudos from the interviewer.

At the recent International Solid State Circuits Conference (ISSCC 2021), TSMC provided an insightful technical presentation into their unique approach to register file implementation for the 5nm process node. [1]

The rest of this article provides some of the highlights of their decision and implementation tradeoffs.  I would encourage SemiWiki readers to obtain a copy of their paper and delve more deeply into this topic (particularly before a job interview).

Register File Bitcell Implementation Options

There are three general alternatives for selecting the register file bit cell design:

  • an array of standard-cell flip-flops, with standard cell logic circuitry for row decode and column mux selection

The figure above illustrates n registers built from flip-flops, with standard logic to control the write and read cycles (shown separately above) – one write port and two read ports are shown.

  • a conventional 6T SRAM bitcell

The figure above illustrates an SRAM embedded within a stdcell logic block, where the supply voltage domains are likely separate.  Additional area around the SRAM is required, to accommodate the difference between the conventional cell layout rules and the “pushed” rules for (large) SRAM arrays.

  • a unique bitcell design, optimized for register file operation

For the 5nm register file compiler, TSMC chose the third option using the bitcell illustrated above, based on the considerations described below.  Note that the 16-transistor cell includes additional support for masked bit-level write, using the additional CL/CLB inputs.  The TSMC team highlighted that this specific bit-write cell design reduces the concern with cell stability for adjacent bitcells on the active wordline that are not being written – the “half-select” failure issue (wordline selected, bit column not selected).

Bitcell Layout

The foundry SRAM compiler bitcell typically uses unique (aggressive) layout design rules, optimized for array density.  Yet, there are specific layout spacing and dummy shape transition rules between designated SRAM macros and adjacent standard cell logic – given the large number of register files typically present in an SoC architecture, this required transition area is inefficient.

Flip-flops use the conventional standard cell design layout rules, with fewer adjacency restrictions to adjacent logic.

For the TSMC 5nm register file bitcell, standard cell digital layout rules were also used.

Peripheral Circuitry

A major design tradeoff for optimal register file PPA is the required peripheral circuitry around the bitcell array.  There are several facets to this tradeoff:

  • complexity of the read/write access cycle

The flip-flop implementation shown above is perhaps the simplest.  All flip-flop outputs are separate signals, routed to multiplexing logic to select “column” outputs for a read cycle.  Yet, the wiring demand/congestion and peripheral logic depth grows quickly with the number of register file rows.

The SRAM uses dotted bitcell inputs and outputs along the bitline column;  the decoded row address is the only active circuit on the bitline.  A single peripheral write driver and differential read sense circuit supports the entire column.

The TSMC register file bitcell also adopts a dotted connection for the column, but separates the write and read bit lines.  The additional transistors comprising the read driver in the cell (P6, N6, P7, and N7 in the bitcell figure above) offer specific advantages:

  • the read output is full-swing, and static (while the pass gate N7/P7 is enabled)

No SRAM differential bitline precharge/discharge read access cycle is needed, saving power.  The read operation does not disturb the internal, cross-coupled nodes of the bitcell.

  • the read and write operations are independent

The use of separate WWL and RWL controls allows a concurrent write operation and read operation to the same (“write-through”) or different row.

Although based on digital standard cell design rules, note that the peripheral circuitry for the TSMC register file design needs some special consideration.  The read output transfer gate circuit presents a diffusion node at the bitcell boundary, with multiple dotted bitcell rows.  This node is extremely sensitive to switching noise, and requires detailed analysis.

Vt Selection

The choice of standard cell design rules also allows greater flexibility for the TSMC register file bitcell.  For example, low Vt devices could be selectively used in the read buffer for improved performance, with a minor impact on bitcell leakage current, as illustrated below.

VDD Operation

Perhaps the greatest register file implementation tradeoff pertains to the potential range of operating supply voltages available to foundry customers.  At advanced process nodes, the range of supply voltages needed for different target markets has increased.  Specifically, very low power applications require aggressive reductions in VDDmin – e.g., for the 5nm process node, logic functionality down to ~0.4-0.5V (from the nominal VDD=0.75V) is being pursued.

The use of standard cell design rules enables the register file implementation to scale the supply voltage with the logic library – indeed, the embedded register file can be readily integrated with other logic in the block in a single power domain.

Conversely, the traditional SRAM cell design at advanced nodes increasingly requires a “boost” during the write operation, to ensure sufficient design margin across a large number of memory bitcells, using aggressive design rules.  This write assist cycle enables a reduction in the static SRAM supply voltage, reducing the SRAM leakage current.  Yet, it also introduces considerable complexity to the access cycle with the charge-pump boost precursor (possibly even requiring a read-after-write operation to confirm the written data).

Write Power

Another comparison to a conventional SRAM bitcell worth mentioning is that the feedback loop in the TSMC register file bitcell is broken during the write operation.  (Most flip-flops circuits also use this technique.)  The write current overdrive used to flip the state of the SRAM bitcell with cross-coupled inverters dissipates greater power during this cycle.

Testsite and Measurement Data

The first figure below shows the 5nm register file testsite photomicrograph, with two array configurations highlighted.  The second figure illustrates the measured performance data for 4kb and 8kb register file macros, across VDD and temperature ranges.  Note the selection of a digital process design enables functional operation down to a very low VDDmin.

(Astute observers will note the nature of temperature inversion in the figure – operation at 0C is more limited than at 100C.)

The testsite macros also included DFT and BIST support circuitry – the test strategy (and circuit overhead) is definitely part of the register file implementation tradeoff decision.

Summary:  The Final Tradeoff

Like all tradeoffs, there is a range of applicability which much be taken into account.  for the case of register file implementation using either flip-flops, conventional SRAM bitcells, or a unique bitcell as developed by TSMC for the 5nm node, the considerations are:

  • area:  dense 6T SRAM cells with complex peripheral circuitry versus larger area cells (using digital design rules)
  • VDDmin support (power) and VDDmax capabilities (performance, reliability)
  • masked bit-write requirements
  • test methodology (e.g., BIST versus a simple scan chain through flip-flops)
  • and, last but certainly not least,
  • number of register file access ports (including concurrent read/write operation requirements)

The TSMC focus for their ISSCC presentation was on a 1W, 1R port architecture.  If more register file ports are needed, the other tradeoff assessments listed above change considerably.

The figure below illustrates the area tradeoff between an SRAM bitcell and the 5nm bitcell, indicating a “cross-over” point at ~40 rows (for 256 columns).  The 4kb (32×128) and 8kb (32×256) register file macros shown earlier fit with the preferred window for the fully digital bitcell design.

For reference, TSMC also shared this tradeoff for their previous 7nm register file design, as shown below (1W1R ports). [2]  Note the this figure also includes the lower range, where a flip-flop-based implementation is attractive.

Yet, as currently SoC architectures demand larger on-die local storage, the unique bitcell design in 5nm supporting optimum 4kb and 8kb macros hits the sweet spot.

Hopefully, this article will help you nail the register file design job interview question.   🙂

I would encourage you to read the TSMC papers describing their design approach and tradeoff assessments on 5nm (and 7nm) register file implementations.

-chipguy

References

[1]  Fujiwara, H., et al., “A 5nm 5.7GHz@1.0V and 1.3GHz@0.5V 4kb Standard-Cell-Based Two-Port Register File with a 16T Bitcell with No Half-Selection Issue”, ISSCC 2021, paper 24.4.

[2]  Sinangil, M., et al., “A 290mV Ultra-Low Voltage One-Port SRAM Compiler Design Using a 12T Write Contention and Read Upset Free Bitcell in 7nm FinFET Technology”, VLSI Symposium 2018.


TSMC Plans Six Wafer Fabs in Arizona

TSMC Plans Six Wafer Fabs in Arizona
by Scotten Jones on 03-10-2021 at 10:00 am

TSMC Fab 18 Remdering

There are reports in the media that TSMC is now planning six Fabs in Arizona (the image above is Fab 18 in Taiwan). The original post I saw referred to a Megafab and claimed six fabs with 100,000 wafers per month of capacity (wpm) for $35 billion dollars. The report further claimed it would be larger than TSMC fabs in Taiwan.

This report struck me as not reliable given that TSMC refers to their large fab clusters as Gigafabs not Megafabs and TSMC’s Fab 12, Fab 14, and Fab 15 each have capacity of around 300,000 wpm and Fab 18 is ramping to over 200,000 wpm.

Now similar reports are being repeated in more reputable sources, notably today I saw a report in EE News Europe that stated:

  • The site would be a Gigafab (correct terminology).
  • Filings with the city of Phoenix describe three phases of building.
  • TSMC has reportedly offered to double employee salaries to move to the US.

I am still not sure about the six fab part, the Phoenix documents are reported to say three phases although I suppose each phase could be two fabs. The other issue I have is that 100,000 wpm for six fabs is just under 17,000 wpm per fab, those are smaller fabs than TSMC typically builds and would be sub optimal from a cost perspective.

What I would think would be more likely is three fabs of just over 30,000 wpm each for a total of 100,000 wpm. Maybe they will build three fabs initially for 100,000 wpm and then have the option to build three more fabs later for an additional 100,000 wpm. Fab 18 in Taiwan has three fabs P1, P2 and P3 that are running 5nm with an original capacity of just under 30,000 wpm each although they are now being expanded to 40,000 wpm each, 120,000 wpm total. There is also P4, P5, and P6 under construction for 3nm that will likely each be around 30,000 wpm each initially bringing the site to around 200,000 wpm.

The $35 billion dollar price tag is high for 100,000 wpm of 5nm but would make sense if it also included some preparation for additional phases or 3nm capability. I should also point out the initial budget number for fabs is often an estimate and can increase or decrease as the fab is built depending on final capacity and how many fab phases are included in the initial amount. I believe TSMC has spent more money on phase 1, 2 and 3 of Fab 18 for 5nm than they originally announced and will also be spending more money on phases 4, 5 and 6 for 3nm than originally announced.

My best guess as of todays is the fab will have three phases initially producing 100,000 wpm total with the option to add three more phases in the futures to reach 200,000 wpm, that would be more consistent with TSMC Fab 18 in Taiwan.

However, the specifics work out it does appear that TSMC is now looking at building a full scale Gigafab in the US instead of the small fab originally planned. I see this as good news for the global semiconductor supply due to the high risk presented by having so much of the world’s leading edge logic capacity concentrated in Taiwan. This is especially concerning with Taiwan being located on an active fault line, the view in China that Taiwan is a rouge province that must be brought back under China control and the resource limits of a small island.

 


Chip Channel Check- Semi Shortage Spreading- Beyond autos-Will impact earnings

Chip Channel Check- Semi Shortage Spreading- Beyond autos-Will impact earnings
by Robert Maire on 03-07-2021 at 10:00 am

Robert Maire 2

– Semiconductor shortage is like toilet paper shortage in early Covid
– Panic buying, hoarding, double ordering will cause spike
– Could cause a year+ of dislocation in chip makers before ending
– Investors, Govt & Mgmt will get a wake up call from earnings hit

Auto industry is just a prominent tip of chip crunch iceberg. We believe the chip shortage is spreading across other industries

The automotive industry is just a very prominent, in your face, example of the semiconductor industry problem as it involves the highest financial impact ratio; That is that a 25 cent chip can stop the revenue associated with a $50,000 car.

Wait till we get the earnings report from Ford for Q1 and they have a significant revenue and earnings shortfall, due to the production halts, that they blame on those tech guys in California’s Silicon Valley.

From an investment perspective we think we will see similar revenue and earnings impact across a number of industries…not just tech related.

In the past we have seen delays in laptops and servers which were relatively common. Last year I ordered a laptop that was delayed two months due to “production problems” (AKA chip shortage).

We would expect chip shortages to hit telecommunications equipment makers; everything from 5G to routers etc. Video cards have always been in short supply due to chip shortages. It could roll downhill to consumer goods from TVs to washers (don’t laugh, large appliances have been already in short supply). We would bet that earnings season will see a whole bunch of diverse companies missing numbers due to components shortages. Its just hard to predict who because everything has a chip in it.

Being a Big BFF with a long history helps

In this type of situation it pays to be a long time, big, close customer to the chip makers, like Apple. They are so tight with TSMC there is no light between them. You can rest assured that Apple will get all the chips it needs, both expensive and cheap from TSMC and they will always be first in line. Apple is TSMC’s number one customer so it will be no other way.

On the other end of the spectrum you likely have auto makers who are notoriously tough with their suppliers buying 25 cent chips at low margins. What are the odds of their orders being sped up? Zero.

Auto makers only have themselves to blame as they cut orders early in Covid and shouldn’t be shocked when they had to get back in line, at the end of the line, to re-order. Its called supply chain management.

Tom Caufield, CEO of GlobalFoundries, had said that his phone is ringing off the hook from auto manufacturers asking for wafers and he is “everybody’s new best friend”.

Broadcom’s CEO, Hock Tan, said on their call last night that Broadcom is pretty much booked up for the year and he doesn’t know when the shortage will subside. Broadcom is a big customer of TSMC and it doesn’t sound like they are getting extra wafer capacity.

Panic buying, hoarding & double ordering. The toilet paper shelves are empty.

Perhaps the biggest physical evidence of the panic Covid caused was the shortages of toilet paper in supermarkets in the early part of Covid.

Consumers probably thought they were going to be locked in their homes for months or paper factories would be shut down for months because it seemed like a years worth of TP was sold in days.

As we have seen in the past we think we are also seeing evidence of panic buying of chips, double ordering and stocking up.

We think there has already been hoarding by Chinese customers for well over a year who were concerned, rightfully so, about being cut off. Now add to that, hoarding by more customers currently experiencing supply problems. If I were in the auto industry supply chain I would be double and triple ordering and stocking up lest I lose my job.

Coming down off the “sugar high” may be problematic- Is this the high point in the cycle?

Right now chip makers are everyone’s best friends and popular on speed dial but the hangover from the current party could create a headache. As we know from a very long history, the chip industry is cyclical and goes though those cycles which are based on supply and demand and therefore pricing. Right now supply is short and demand is high…maybe artificially high due to hoarding and double ordering…and maybe supply is tight in the short term due to the Texas power problem and other issues….seems a bit like a “perfect storm”

A year or two from now chip makers could be a swipe left and ghosted by those currently in desperate need of a chip fix. Poetic justice would be for chip equipment to suffer shortages. Not likely.

It would be very funny cosmic Karma if chip equipment companies were impacted by the current chip shortages. After all, semiconductor equipment does happen to have a lot of semiconductors in it and the supply chain goes directly through China. The equipment controllers are basically souped up PC’s and dep and etch tools have a myriad of sub-system suppliers; robots, RF, Gas boxes etc; An EUV lithography tool is such a Rube Goldberg it likely has hundreds of chips.

We don’t expect a problem from chip equipment makers, but it could happen. In general, most everybody in the chip industry understands and is on guard for supply issues….obviously unlike the auto industry.

Channel Checks say its not just chips

From what we can tell the shortage issues seem to go beyond chips. Other components and discrete semiconductors are also short in some cases. However, this is likely due to panic buying and ordering from nervous customers and not systemic supply issues as in the mainstream chip industry.

Is the Panic worse than the Problem?

Much as with toilet paper the problem is likely less severe than the issues caused by the surrounding panic. The semiconductor industry making the news is far from normal. If I made a $50 consumer good with chips in it, I might get freaked out when I hear Ford has to shut down factories cause they can’t get chips.

The only good thing that has come out of this is that this long term issue has finally risen to the level where it has hit the White House and they are talking about the industry and doing something about it (which we have never seen before…)

Could the chip shortage hit economic growth and Covid recovery?

The dislocation in the chip industry does not come at a good time as we are looking at climbing out of the hole that Covid has put us in. Having car factories shut down and revenue and earnings hits at some companies certainly will not help the recovery.

It just creates more friction and resistance to the recovery. We think we could very easily see two to three quarters of direct impact on companies with some residual impact even further out. What remains to be seen is whether the lessons learned will actually be adopted or forgotten once it leaves our immediate memory, a year down the road.

The stocks
Chip companies in general are obviously doing very well due to near term demand. Equipment companies are also doing very well as capital spending is high and will remain high while chip companies business is so good.

After a yearlong or more strong run, it has been feeling like the semiconductor stocks want to roll over. We have had some days of stumbles. Valuation multiples are at all time highs. Some suggest a “re-pricing” but we had a similar re-pricing at the last cyclical peak only to pull back.

2021 is shaping up to be a very good year as momentum seems strong for business with little probability of a downturn. But the stocks don’t always follow earnings step for step and the semi stocks have always turned before business turned.

The chip shortage will eventually end and the real question is what happens after?

Also Read:

Semiconductor Shortage – No Quick Fix – Years of neglect & financial hills to climb

“For Want of a Chip, the Auto Industry was Lost”

Will EUV take a Breather in 2021?