Why In-Memory Computing Will Disrupt Your AI SoC Development

Why In-Memory Computing Will Disrupt Your AI SoC Development
by Ron Lowman on 03-22-2021 at 6:00 am
Categories: IP, Synopsys

dwtb q121 in memory comp fig1.jpg.imgw .850.x 1

Artificial intelligence (AI) algorithms thirsting for higher performance per watt have driven the development of specific hardware design techniques, including in-memory computing, for system-on-chip (SoC) designs. In-memory computing has predominantly been publicly seen in semiconductor startups looking to disrupt the industry, but many industry leaders are also applying in-memory computing techniques under the hood.

Innovative designs using in-memory computing are intending to disrupt the landscape of AI SoCs. First let’s take a look at the status-quo that startups using in-memory computing intend to disrupt. AI hardware has taken a huge leap forward since 2015 when companies and VCs started investing heavily into new SoCs specifically for AI. Investment has only accelerated over the past 5 years, leading to many improvements in AI hardware design for industry leaders. Intel’s x86 processors have added new instructions and even a separate NPU engine. Nvidia has added specific Tensor Cores and forsaken GDDR to implement HBM technologies to increase memory bandwidth. Google has developed specific ASIC TPUs, or Tensor Processing Units, dedicated to AI algorithms (Figure 1). But even though these architectures continue to improve, investors are looking to startups to develop the next disruption in AI technology.

Figure 1: Intel, Nvidia and Google are introducing new hardware architectures to improve performance per watt for AI applications

Why are Disruptions for AI Compute so Interesting?

The three key reasons for heavy investment into AI hardware are: 1) the amount of data generated is growing exponentially and AI is the critical technology to address the complexity; 2) the costs of running AI algorithms in power and time are still too high with existing architectures, specifically at the edge; 3) the parallelization of AI compute engines is reaching die size limits, driving these systems to scale to multiple chips which is only practical in cloud or edge-cloud data centers. Together, these new challenges are driving designers to explore new, innovative hardware architectures. In-memory compute is looked upon as one of the most promising hardware innovations because it may provide multiple orders of magnitude in improvements.

Paths for AI Compute Disruption

Startups and leading semiconductor providers are looking at potential paths for AI compute acceleration.

New types of AI models: New neural networks are being introduced quite often. For example, Google’s huge research team dedicated to releasing models has produced EfficientNet. Advanced Brain Research has released the LMU, and Lightelligence has partnered with MIT to run Efficient Unitary Neural Network (EUNNs).
Integrated photonics is being explored by several startups as another method for disruption.
Compression, pruning and other techniques are being developed to enable specific AI functions to operate on small, efficient processors such as a DesignWare® ARC® EM Processor IP running under 100MHz.
Scaling compute systems by packaging multiple die, multiple boards, or multiple systems is already in full production from the industry leaders. This solution is used to solve the most complex, costly challenges with AI.
These methods to increase performance are all being pursued or already realized. In-memory computing designs can build on these methods to drive efficiencies with multiple times improvements in addition to the other developing technologies.

What is In-Memory Computing?

In-memory computing is the design of memories next to or within the processing elements of hardware. In-memory computing leverages register files, memories within processors, or turns arrays of SRAMs or new memory technologies into register files or compute engines themselves. For semiconductors, the essence of in-memory computing will likely drive significant improvements to AI costs, reducing compute time and power usage.

Software and Hardware for In-Memory Compute

In-memory computing includes both hardware and software elements, which can cause some confusion. From a software perspective, in-memory computing refers to processing analytics in local storage. Basically, the software takes full advantage of the memories closer to the compute. “Memories” is a bit vague from a hardware perspective and can refer to DRAMs, SRAMs, NAND Flash and other types of memories within the local system rather than sourcing data over a networked software infrastructure. Optimizing software to take advantage of more localized memories has vast opportunity for industry improvement and teams of engineers will need to continue focus on these innovations at a system level. However, for hardware optimizations, in-memory compute offers bit level innovations that more closely mimic the human brain which is 1000s of times more efficient than today’s compute.

In-Memory Compute, Near-Memory Compute, and Analog Compute

In-memory computing hasn’t just arrived as a magic solution to AI algorithms—it has differing implementations and is evolving from a progression of innovations. The implementation of register files and caches has been around for decades and near-memory computing has been the natural progression of improvement and has seen implementations in new SoCs over the past several years. AI algorithms require millions, if not billions, of coefficients and multiply-accumulates (MACs). To efficiently perform all these MACs, customized local SRAMs for an array of MACs are now designed into SoCs for the sole purpose of performing AI model math, i.e., matrix/tensor math. Integrating dedicated specialized local SRAMs for an array of MACs to perform AI model math is the concept of near-memory compute. In near-memory compute, local SRAMs are optimized for the purpose of storing weights and activations needed for their designated MAC units.

The next natural progression to develop in-memory compute is analog computing. Analog computing enables additional parallelism and more closely mimics the efficiencies of a human brain. For analog systems, MACs and memories are parallelized, improving the system efficiency even further than near-memory compute alone. Traditional SRAMs can be the basis for in-memory analog computing implementations and Synopsys has delivered customizations for this very purpose.

Memory Technologies Address In-Memory Compute Challenges

New memory technologies are such as MRAM, ReRAM and others are promising as they provide higher density and non-volatility when compared to traditional SRAMs. Improvements over SRAMs can increase the utilization of the compute and memory on-chip. Utilization is one of the most critical design challenges for AI SoC designers (Figure 2). SoC designers need memory subsystems designed specifically for AI data movement and compute regardless of the technology used.

Figure 2: AI SoCs have extremely intensive computation and data movement, which can impact latency, area, and performance

The key challenges for AI SoC design with memory systems relate back to the number of MACs and coefficients that need to be stored. For ResNet-50, over 23M weights are needed and that computes into 3.5 billion MACs and 105B memory accesses. Not all are running at the same time, so the size of the largest activation can be the critical bottleneck to the memory subsystems. Control engineers know that efficiencies are made by designing bottlenecks to be at the most expensive functions of execution. Thus, designs need to ensure that their in-memory compute architectures can handle the largest layer of activation coefficients effectively.

Meeting these requirements demands huge amounts of on-chip memory and intensive computation of the multiple layers. Unique techniques in memory design are being developed to remove latencies, remove the size of coefficients and remove the amount of data that must be moved around the SoC.

DesignWare IP Solutions for In-Memory Compute

Synopsys provides a wide array of IP options for customers to implement in-memory computing. Optimized memory compilers specific for density or leakage are used to develop the local SRAMs for near-memory implementations where sometimes 1000s of MACs are instantiated. MACs can leverage a portfolio of Synopsys Foundation Core primitive math functions that includes flexible functions such as Dot Product, a common AI function.

In addition, Synopsys DesignWare Multi-Port Memory IP enabling up to 8 inputs or 8 outputs improves parallelism within the compute architectures. Multi-port memories are much more common within designs since AI has become so prevalent.

Synopsys developed a patented circuit that demonstrates innovations supportive of in-memory compute. A Word All Zero function, shown in Figure 3, essentially eliminates zeros from being processed. Why move zeros to multiply? The Word All Zero function significantly reduces the compute required and can reduce power by up to 60% for data movement within the chip.

Figure 3: In addition to the Word All Zero function, Synopsys DesignWare Embedded Memory IP offers multiple features to address power, area, and latency challenges

Conclusion

How fast in-memory compute is adopted in the industry remains to be seen; however, the promise of the technology and conceptual implementation with new memories, innovative circuits and creative designers will surely be an exciting engineering accomplishment. The journey to the solution is sometimes as interesting as the final result.

For more information:

White paper: Neuromorphic Computing Drives the Landscape of Emerging Memories for Artificial Intelligence SoCs

March 21, 2021March 21, 2021

Upcoming Webinar on Resistive RAM (ReRAM) Technology

Upcoming Webinar on Resistive RAM (ReRAM) Technology
by Kalar Rajendiran on 03-21-2021 at 10:00 am
Categories: eMemory, Events, IP

On-chip memory (embedded memory) makes computing applications run faster. In the early days of the semiconductor industry, the desire to utilize large amount of on-chip memory was limited by cost, manufacturing difficulties and technology mismatches between logic and memory circuit implementations. Since then, advancements in semiconductor manufacturing have been bringing on-chip memory costs down. In parallel, leading edge process nodes have been throwing new challenges to embedded memories. Of course, high-speed I/O interfaces have made it easier to use off-chip memories without sacrificing computing application speed. At the same time, new applications such as AI, machine learning, mobile and other low-power applications have been fueling demands for large amounts of embedded memories. Many of the existing embedded memory technologies face challenges as the process node goes below 28nm. The challenges are due to additional material layers and masks, supply voltages, speed, read & write granularity and area.

It is in this context that eMemory Technology Inc. will be hosting a webinar that will be very informative and useful for chip designers and semiconductor companies. The webinar is titled “eMemory’s Embedded ReRAM Solution on Nanometer Technologies” and is scheduled for March 24^th, 2021. I got an opportunity to preview the webinar content. Following is just a few of the salient points that I’d like to share in this blog. Please register for the webinar to learn the full and intricate details.

The webinar will focus on a very promising technology called Resistive RAM (ReRAM) that will be available in production very soon. ReRAM is specifically designed to work in 40nm and finer geometry process nodes. In contrast, many of the other memory types such as Split-Gate Flash, Logic process MTP and Logic Process EEPROM face challenges below 28nm.

Due to ReRAM’s simplicity for process manufacturing, it can be integrated into Back End of Line (BEOL) with only a few extra masks and steps. ReRAM technology enables high-speed, low-power write operations and increased storage density, all critical for AI computing-in-memory application as an example.

Attendees will gain insights into ReRAM cell structure, switching methodology, and the suitability of ReRAM to various prospective applications. eMemory Technology will also share measurement results of their 40nm ULP and 22nm ULL ReRAM reliability data at 85C and 125C operation and 10-year retention data after 10K cycles.

Anyone who is looking into designing chip solutions in advanced process nodes for applications that could benefit from embedded memories would learn a lot from attending this webinar. Register here for the “eMemory’s Embedded ReRAM Solution on Nanometer Technologies” webinar.

March 21, 2021May 7, 2021

RIP Jim Hogan – An Industry Icon

RIP Jim Hogan – An Industry Icon
by Bernard Murphy on 03-21-2021 at 8:00 am
Categories: AI, EDA, General, IP
5 Comments

An unavoidable consequence of getting older is that more frequently our friends and colleagues unexpectedly leave us for their final venture. Jim Hogan, widely known and loved in the semiconductor industry, has passed on. He will leave a substantial hole in the hearts of many. Always ready with seasoned advice, a sympathetic ear and a boundless stock of entertaining stories. I for one will never forget his patient and encouraging support. For now, I must make do by remembering the man who helped and inspired me in so many ways. My thanks also to Peter Calverley and Scott Becker of Tela Innovations for filling in some of the blanks. RIP Jim Hogan, a dear friend to many of us.

The early days

I first met Jim in the late 80’s at National Semiconductor. He was a big wheel in computer integrated manufacturing, and I was a lowly CAD manger in the ASIC group. He left to join Cadence and I independently left for Cadence not long after. Our orbits didn’t overlap too much during that period, but I remember a friendly easy-going recognition at those times our paths did cross.

Jim stayed at Cadence for a while, running a division, later Japan Operations before moving on to Artisan Components as the head of Business Development. Which culminated in Artisan’s acquisition by ARM. Jim then switched to what would become his true love – investing in and guiding early-stage ventures. If you were a Jim Hogan watcher at all, you’ll know he was involved with many successful exits. However, he was a modest guy. He told me that there were many more not-so-successful investments. He would often laugh about Theranos as one painful example.

Investing and guiding

Jim invested first through Cadence’s Telos Venture Partners. Later and together with Scott Becker, a close friend he first met at Artisan, he formed his own venture fund Vista Ventures. At the same time Jim helped Scott form Tela Innovations and served on the board for over fifteen years. Vista Ventures was the vehicle through which he invested in many of the companies we know he helped. Most recently Jim complemented his investment activity by joining the board of Silicon Catalyst.

Nothing could get Jim more excited than new technologies and new ideas. In my closing days at Atrenta, I got into blogging, particularly on harebrained ideas – which Jim enthusiastically encouraged. I’m not sure which of us was crazier. One blog was on how we could exploit biological security parallels (antibodies and so on) in system security. He wanted to turn it into a Ted talk. The guy was infectiously excited by any new tech idea.

He guided me in my early freelancing, helping setup assignments and introducing me to key executives looking for content marketing help or strategic marketing guidance. I was lucky to work together with Jim on some of these projects, for example the work we have done together with Paul Cunningham at Cadence in the “Innovation in Verification” series. Paul and I are the techie enthusiasts. Jim always grounded us with his investment insight. He also provided me with the content for chapter 4 of my recent book (The Tell-Tale Entrepreneur). That chapter offers a fascinating view into investment through the eyes of an investor.

The person

I wasn’t lucky enough to meet Jim’s family, but I know we shared common interests outside technology. We were always debating how to manage fire clearance, tractors and attachments, drilling new wells and building versus buying a new home. He talked often and affectionately about Lisa and even more often about Jake and his adventures, most recently his fascination with chain saws (I can relate).

My abiding impression of Jim is that for all his accomplishment and renown in the industry, he topped it by being one of the most genuinely nice human beings you could ever hope to meet and count as a friend. We all want to succeed in fame and fortune. Jim had those but more important he left a lasting impression as the kind of person we all hope to be when our time finally comes. Rest in peace Jim. We won’t find your like again.

If you would like to express your appreciation of Jim, please submit your entry to nominate him the the Phil Kaufman Hall of Fame.

Podcast EP3: Tomorrow’s Semiconductors with Jim Hogan

March 21, 2021March 26, 2023

Micron- Optane runs out of Octane- Bye Bye Lehi- US chip effort takes a hit

Micron- Optane runs out of Octane- Bye Bye Lehi- US chip effort takes a hit
by Robert Maire on 03-21-2021 at 6:00 am
Categories: Semiconductor Advisors, Semiconductor Services
2 Comments

– Micron shuts down once promising XPoint
– Lehi Utah fab to be sold off- Had been a $400M drain
– Unique memory couldn’t follow flash down cost/yield curve
– Savings helps Micron but its now just another memory maker

XPoint “Coulda been a contender”

XPoint should have amounted to more than a footnote in semiconductor history. It promised speed between NAND and DRAM, closer to DRAM at costs approaching NAND with the benefit of being non-volatile.
But it wasn’t meant to be.

Intel pulled out of the partnership a while ago, not wanting to throw more money down a hole. It now looks like Micron was cleaning things up to get ready to shut it down.

We are certainly vey disappointed that in the end it didn’t work as it had clear promise and a shot at being the next memory technology since NAND was invented back in 1980.

Couldn’t get on the Moore’s Law cost/yield curve

The problems appears to be that the technology was never able to get on , let alone stay on, the Moore’s Law cost curve that keeps driving memory prices ever lower on a per bit basis.

You need two basic ingredients to make it work; yield and shrinks. The yield (percentage of working chips on a wafer) has to get to a point where there are enough working chips per wafer divided by the per wafer cost to make the needed market price to be competitive. Second the technology also has to work to the point where you can reliably shrink the dimensions of the chip design on a regular cadence to continually increase the number of bits per square inch to keep up with the market.

Was it the failure of XPOINT or the Success of NAND that caused XPoint’s demise?

Maybe it was both…..One could argue that moving NAND to a 3D architecture just accelerated it too far ahead from XPoint’s ability to ever keep up. Others could argue that XPoint never met its intended goals of price, performance and yield.

At this point a post mortem is almost pointless as its dead anyway.
However it does amplify exactly how incredibly difficult the semiconductor industry is, even with very deep pockets and both Intel and Micron supporting it, they still couldn’t get it to work well enough to make the cut.

Good and bad for Micron

The good news is that Micron will get rid of a $400M/year cash drain, the bad news is that Micron will be just another memory competitor up against the likes of Samsung and a more determined SK.

XPoint, had it worked , could have been a great differentiator that no other memory company had and would have put Micron in a unique position. Now they are relegated to slugging it out with Samsung and trying to find small niches where they have a unique advantage.

Don’t get me wrong….Micron has proven very good at weaving and dodging among the big boys and just outmaneuvered them by keeping a step or two ahead in certain areas. But XPoint could have been a different type of lifeline.

Not good for MRAM, RRAM & PRAM

There are a number of other memory technologies that are also being developed as competitors to todays DRAM NAND duopoly. All offer attractive alternative characteristics to DRAM or NAND. In our view, XPoint was likely the best funded memory alternative, had the best supporters, Intel & Micron and had both a dedicated fab as well as commercial installations in end customer products and it still failed.

Its going to be very difficult to do what XPoint couldn’t, even with all the attributes it had going for it.

Bye Bye Lehi- Its sale won’t help current shortage

Micron is selling off the associated fab in Lehi. The positive here is that they will likely get a reasonable price as compared to the scrap value that old fabs usually sell for given current demand.

It will cost a lot of time and money to re-configure the fab for logic as it is likely not big enough for anything other than specialty memory.

We would guess it could take a couple of years to re-configure so it isn’t going to be any help at all for todays current chip shortage.

We also wouldn’t be too sure that it will stay a fab at all. It may be more financially attractive for Micron to sell off the tools to be shipped off to Asia to be installed into fabs there as we have seen happen with other US fabs.

Maybe even Micron itself, which is the King of getting fabs on the cheap, might part out the bits and pieces of the fab to its own fabs where they could add incremental capacity.

Its not clear whether its worth more as parts or as a whole.

Doesn’t bode well for US chip efforts

This clearly flies directly in the face of current discussions about helping the US chip industry. Here we are with a US company, located in the heartland of Boise Idaho shutting down a US fab while continuing their overseas operations which have been expanding.

The US government, could put its money and effort where its mouth is and keep the fab in the US and in US hands. Perhaps it could be the first poster child and spearhead of the effort to boost the US semiconductor industry and save it from itself, or not. Wake up! this is an opportunity!

The stocks

Investors will clearly view this as a positive for Micron as it cuts the cash drain and may supply some short term cash before the end of the year. Longer term it makes Micron less competitive but investors don’t generally care about the longer term.

Its likely a neutral to slightly positive for equipment companies as Micron will have more money to spend but won’t be spending it on Lehi (not that there were any plans to spend anyway). Shutting down XPoint was somewhat expected so its not a huge surprise just more of a relief.

Micron’s stock is not that expensive as many investors do not believe forward earnings given the volatility of the memory industry. This may help them make numbers.

R.I.P. – XPoint/Optane & Lehi

Also Read:

Chip Channel Check- Semi Shortage Spreading- Beyond autos-Will impact earnings

Semiconductor Shortage – No Quick Fix – Years of neglect & financial hills to climb

“For Want of a Chip, the Auto Industry was Lost”

Podcast EP12: A Close Look at Intel with Stacy Rasgon

Podcast EP12: A Close Look at Intel with Stacy Rasgon
by Daniel Nenni on 03-19-2021 at 10:00 am
10 Comments

Dan takes an in-depth look at Intel with Stacy Rasgon, Managing Director and Senior Analyst, U.S. Semiconductors at Bernstein Research. Stacy is an unusual semiconductor analyst as he holds a Ph.D. in chemical engineering from MIT. His substantial technical knowledge allows for a deep dive on Intel that you will find refreshing and quite interesting.

We cover several CEO regimes at Intel with a frank assessment of the latest talent infusion. What will Intel do next? Stacy offers some interesting perspectives during our discussion.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.

Intel on SemiWiki

March 19, 2021May 4, 2022

Electronics Back Strongly in 2021

Electronics Back Strongly in 2021
by Bill Jewell on 03-19-2021 at 8:00 am
Categories: China, Semiconductor Intelligence, Semiconductor Services

Electronics production has recovered strongly from slowdowns due to the COVID-19 pandemic. Most major Asian electronic producers reported double-digit increases in early 2021. The chart below shows three-month-average change versus a year ago for electronics production. The data is from each country’s official statistics and is in local currency.

China, by far the largest producer of electronics, was growing at about a 10% average rate in 2019. As many factories were shut in early 2020 due to the COVID-19 pandemic, electronics production change versus a year ago turned negative. Production growth was back to the 10% range in May 2020. Production came back robustly in early 2021 compared to the weakness of a year ago. In February 2021, China’s electronics production was up 36%.

Vietnam was one of the most successful countries in fighting the COVID-19 outbreak. According to the World Health Organization (WHO), Vietnam had only 2,567 cases and 35 deaths out of a population of over 94 million. The country did shut down for three weeks in April 2020, which resulted in a slowdown in electronics production. Production bounced back to double-digit growth in October 2020. Vietnam’s electronics production growth in February 2021 was 23% versus a year ago.

South Korea was relatively successful in fighting COVID-19, with WHO reporting 97,294 cases and 1,688 deaths out of a population of over 50 million. South Korea never shut down businesses, which enabled growth in electronics production throughout 2020. Growth was over 20% in January through March of 2020, with South Korean manufacturing benefiting from slowdowns in other countries. South Korea’s electronic production growth was 12% in January 2021.

Taiwan experienced electronics production growth averaging over 20% in the months of 2019, benefiting from production shifts from China during the U.S.-China trade disputes. Growth slowed to single digits in 2020, but never went negative. Taiwan was also very effective in fighting COVID-19, with only 990 cases and 10 deaths out of a population of over 24 million according to Johns Hopkins University of Medicine.

China’s production growth is reflected in the unit production of two key electronics products – PCs and mobile phones. China produces over 80% of the world’s supply of each of these products. Unit production change versus a year ago of both PCs and mobile phones turned negative in early 2020. PCs returned to growth in April 2020, but mobile phones remained negative throughout 2020. In February 2021, PCs were up 73% and mobile phones were up 19%. These growth rates are impressive but are being measured against a very weak early 2020.

The United States and the United Kingdom (UK) were two of the countries hardest hit by COVID-19, with death rates of over 160 per 100,000 people according to WHO. Several major European Union (EU) countries were also severely impacted, especially Italy, Spain, and France. Lockdowns in the U.S. varied by state, but in general did not have much effect on manufacturing. U.S. electronics production slowed from over 5% year-to-year growth in the first five months of 2019 to less than 1% in December 2019, prior to any COVID-19 related slowdowns. U.S. growth remained below 1% until July 2020 and picked up to over 8% in November 2020. January 2021 growth was 8.7%.

EU electronics production change versus a year ago in 2020 was similar to 2019 – single digit declines in most months. COVID-19 related lockdowns did not seem to significantly affect electronics production. Growth picked up to 11% in December 2020 and 23% in January 2021. The UK officially left the EU on January 31, 2020, a process known as Brexit. A transition period lasted until December 31, 2020 with a final trade agreement at the last minute. The UK electronics production change was like the EU, except for a double-digit decline beginning in April 2020. UK shutdowns impacted manufacturing for several months in mid-2020. UK growth in January 2021 was 0.7%. The UK has not seen the powerful recent growth the EU has experienced. The Financial Times reported over one-third of UK manufacturers have lost revenue since the UK left the EU, primarily due to delays in importing from and exporting to the EU. The long-term effect of Brexit on UK electronics production remains to be seen. Brexit critics foresee companies shifting production from the UK to EU countries to reach EU markets more easily. Brexit supporters predict with UK free of the EU it will be able to increase production for export to the world outside of the EU.

The substantial early 2021 growth in electronics production is reflected in semiconductor shipments, according to World Semiconductor Trade Statistics (WSTS) data. The semiconductor market in 2020 was on a recovery path from a 12% decline in 2019, with three-month-average change rebounding from a 16% decline in June 2019 to a 6.9% increase in March 2020. Growth plateaued in the 5% to 7% range until reaching 9.2% in November 2020. Revenues versus a year ago were up 13.2% in January 2021, the largest increase since October 2018. January 2021 monthly semiconductor shipments were $3,997 million, up 0.1% from $3,992 million in December 2020. The normal seasonal trend is a significant decline in January from December, ranging from -5% to -15% over the last ten years. January 2021 marks the first semiconductor revenue increase from December to January in the history of WSTS data going back to 1984.

Although the global economy has yet to fully recover from the COVID-19 pandemic, the electronics and semiconductor industries seem past the recovery phase and back to healthy growth.

Semiconductor Intelligence is a consulting firm providing market analysis, market insights and company analysis for anyone involved in the semiconductor industry – manufacturers, designers, foundries, suppliers, users or investors. Please contact me if you would like further information.

Also Read:

Semiconductors up 6.5% in 2020, >10% in 2021?

Semiconductor Boom in 2021

China Mobile and Computer Update 2020

March 19, 2021November 21, 2022

Executive Interview: Casper van Oosten of Intermolecular, Inc.

Executive Interview: Casper van Oosten of Intermolecular, Inc.
by Daniel Nenni on 03-19-2021 at 6:00 am
Categories: CEO Interviews, EMD Electronics, Intermolecular, Semiconductor Services

Casper van Oosten is the Business Field Head and Managing Director for Intermolecular, Inc., acquired by Merck KGaA, Darmstadt, Germany in 2019. Prior to this role, Casper worked in various roles on Eyrise™ Dynamic Liquid Crystal Window in Veldhoven, the Netherlands, at an affiliate of Merck KGaA, Darmstadt, Germany. Casper is one of the founders of the company Peer+ BV and the Eyrise™ technology, where he filled the role of CTO and, after acquisition by Merck KGaA, Darmstadt, Germany in 2014, Managing Director and Head of CellTech Operations. In this function, he was responsible for designing, building and operating a first-of-its kind factory for switchable windows.

Casper’s former experience also includes a consultant role at Willems & van den Wildenberg New Business Consulting. Casper holds a PhD in Polymer Technology at Eindhoven University of Technology and MSc Mechanical Engineering at Delft University of Technology.

What brought you to semiconductors?
I achieved my Masters in Mechanical Engineering at Delft University in the Netherlands and spent the last year optimizing the motion control system of a waferstepper. ASML had loaned this tool to the university and basically the whole setup was mine to play with. The precision we could achieve with that tool – accuracy well below 100nm with a tool from the 1990s – was intriguing and I was grasped by the phenomenal physics going on at these small dimensions. However, the increasing power needed in these tools didn’t seem sustainable to me so I ventured into self-assembling molecules – liquid crystals – for almost 2 decades before returning to the semiconductor world. Now, the semiconductor world is even more exciting as we are seeing the first applications where bottom-up self -assembly and top down structuring, for example in the lithography space, are starting to see high-volume applications. For me personally, it feels like I have closed the loop and am excited to work on these projects where the difference of one or two atoms really makes an impact on the overall device performance.

What is the backstory of Intermolecular?
Intermolecular was founded on the concept of high throughput experimentation for materials discovery, or combinatorial chemistry, that is found in life science labs. In this high throughput experimentation, a large number of experiments are executed in parallel, thereby mapping out an experimental space to quickly find optimal materials combinations and process conditions. The Intermolecular team adopted these tools for semiconductor applications, first for wet processes, later for PVD and ALD processes. Using these tools combined with our customers or our proprietary test vehicles, we can very quickly build and analyze test material stacks for their electrical performance. The Intermolecular business went through various business models but since 2015 we have been using these tools to support our customers in accelerating their device development by scouting out material combinations that unlock their next node of devices in a fully exclusive and dedicated fashion. This means that our customers get full ownership of the results of our work and this has allowed us to work with all major semiconductor companies and many of the tool makers. We operate as an extension of our customers’ teams to drive impactful outcomes and product prototypes while creating first to market opportunities.

Since 2019, we are a subsidiary of Merck KGaA Darmstadt, Germany, part of the Semiconductor Solutions business. In this setup, we continue to work for external customers, working with any commercially available materials, but we also closely collaborate with our colleagues in the materials business so that our customers benefit from a seamless materials innovation process.

What customer challenges are you addressing?
We see that complexity is increasing across all application areas of semiconductors to meet the demands of advanced computing and producing smaller, more powerful and power-efficient connected devices. With this, the number of materials used in a single device is increasing. The number of options to consider and combinations to test for development of these devices is also exploding. We take that burden from our customers, working to resolve these materials challenges based on data and real experiments, so that our customers can focus on the other integration challenges at hand. While we do this for the large semiconductor companies, we have lately also seen an increasing number of startups that have found their way to us. These teams typically come to us after their initial funding rounds when they have to start building their first functional devices. While they will look to use known technology as much as possible to reduce risks, their key invention often requires processes and materials that are not known and not fully developed. Therefore, traditional foundries will shy away from these challenges, but this is where Intermolecular’s team and toolset excels. Our cleanroom is setup in a flexible way so we can host these processes and produce these first working prototypes, up to small series, to generate yield and reliability data. Some of our customers will then reinsert wafers back into other processes or in some cases we even host their tools in our cleanroom.

What are the products Intermolecular has to offer?
Any program with Intermolecular aims to resolve the customers materials challenge rapidly using a data driven exploration of the potential solutions in materials and process settings. The exact contents of the program vary as they are always fully customized. We typically start by establishing tool correlations – meaning that we make sure that the solutions we find in our labs can translate to the customers lab. Once that is done, we will take the development through several optimization iterations, working through the list of key parameters to hit, until we have optimized the solution to within the projects targets. These collaborations are very intensive – a typical experimental cycle covering experiment definition, device fabrication, testing, evaluation and analysis is completed in two weeks, and will run from a couple of months to over a year. Once the optimization is done, we support the transfer to the customers tools using wafers processed in our labs and our analytical toolsets.

Application areas where we have been active include many memory technologies such as DRAM, phase change memory and 3D NAND. Furthermore, our materials exploration approach has been very productive in emerging technologies such as neuromorphic computing and quantum computing. We also have worked on the interface of optics and electronics, for example in displays, optical sensors and functional coatings.

What is your competitive positioning?
Intermolecular is one of the few labs that will allow customers to work on material stack development in a fully dedicated way with full IP ownership. Creating such a protected environment is key for us, as it allows our customers to share their full challenges with us which is essential to help them find the best solution. On top of that, our proprietary toolset not only allows us to quickly scan through a variety of material options and test them electrically, but it also allows us to be very flexible to adjust to the process needs of the customer. As we are now part of Merck KGaA, Darmstadt, Germany, this provides the option for our customers to tap into the materials base of our parent company when desired, providing a clear visibility on the supply chain through scale-up through to HVM. We believe that this puts is in a unique position in the value chain. We see that many of our customers recognize this combination brings a lot of benefit to them.

We have recently announced (March 4) closer collaboration with the Merck KGaA, Darmstadt Germany Silicon Valley Innovation Hub, which will now be located in our 150,000 square foot facility in San Jose, providing companies a unique space for innovation and collaboration at the intersection of life science, healthcare and electronic materials. This will allow us to branch out into new areas such as bioelectronics.

What kind of year was 2020 for Intermolecular?
2020 was in many aspects a transformational year for Intermolecular. Our programs involve very intensive interactions with our clients and we normally have our customers in our building on a daily basis. COVID of course changed this aspect completely; interestingly the acceptance of videoconferencing in the industry means that it is easier to join a meeting and we see that we are talking with larger teams at our customers. In my view, this has elevated the impact we make with the customers, though I still believe creative discussions with face-to-face interactions work best. Next to that, 2020 was the first full year as part of Merck KGaA Darmstadt, Germany, and in that sense it have been a discovery of new possibilities – both for our customers as well as for the Intermolecular team. With the combined forces, we are now taking on challenges we could have never mastered alone.

What does 2021 have in store?
The semiconductor industry is going through a strong demand phase right now. So while the large players in the industry are working hard to meet demands and maximize the output from their operations, we are seeing a lot of start-up activity working on solutions that takes us beyond today’s problems. It is a great challenge to work on both helping to meet today’s demands, as well as preparing for the next challenge for the industry. One challenge that is particularly close to my heart is the enormous energy consumption that all our data use is triggering. Continuing at this rate, in 2030 20% of the world’s energy will be used for handling data, something that I find unacceptable. There are some great ideas out there to come up with radically different ways of computing – such as neuromorphic computing – to resolve this. While the industry is hot this year and everyone will be challenged to deliver on today’s demand, I am optimistic that we can make progress on this very fundamental threat to our industry.

WEBINAR: Fundamentals of ferroelectric hafnium oxide for better devices

Also Read:

CEO Interview: R.K. Patil of Vayavya Labs

CEO Interview: Dr. Shafy Eltoukhy of OpenFive

CEO interview: Graham Curren of Sondrel

March 18, 2021December 27, 2023

IP and Software Speeds up TWS Earbud SoC Development

IP and Software Speeds up TWS Earbud SoC Development
by Tom Simon on 03-18-2021 at 10:00 am
Categories: AI, Ceva, IP

The global market for earphones and headphones in 2020 is estimated to have been $34B and is expanding at a compound rate of over 20% per year. Of this almost 50% is said to be earphones which are shifting rapidly to True Wireless Stereo (TWS). We have seen the sales of TWS devices grow from 1M units is 2016 to 109M units in 2019, though it should be pointed out that TWS can also be used for stereo Bluetooth speakers as well. TWS aims to deliver on long battery life, left/right synchronization, improved user interface and high-quality audio. TWS devices can also be used for new and novel functionalities, such as health and sports tracking.

Each of the features mentioned above and the new innovations coming down the pike create major challenges for product developers. CEVA has announced a turnkey hardware and software platform for TWS and Bluetooth SoCs that will help designers quickly solve these challenges. In a nutshell CEVA’s Bluebud leverages several of CEVA’s proven IPs, such as RivieraWave Bluetooth IP, the CEVA-BX1 DSP and the noise reduction, voice and motion processing algorithms in ClearVox, WhisPro and MotionEngine. It offers best in class TWS Bluetooth link stability and audio quality.

CEVA Bluebud TWS Platform

Bluebud uses a single core architecture that combines Bluetooth software, and audio & sensor processing on a single BX1 DSP. Despite being labeled as a DSP is it highly efficient at executing control code. This approach yields lower power, less data movement, lower latency and better left-right jitter control.

CEVA offers all the IP necessary for implementing a TWS system. Their Bluetooth 5.2 dual mode controller is a low-power packet engine with optimized traffic scheduling for Bluetooth Classic and LE. The integrated CEVA-BX1 processor can efficiently execute the protocol stack and handle the DSP processing. It supports a complete memory subsystem, with SRAM, ROM and FLASH, that includes efficient program and data caches. Naturally this comes with a full set of standard interfaces, like GPIOs, timers, PMU, etc.

The CEVA-BX1 processor is a hybrid DSP and controller with 4-way VLIW SIMD. It also comes with a quad 16×16-bit MAC and a Dual 32×32-bit MAC. It offers best in class code density and has a CoreMark/MHz of 4.41. The CEVA-BX1 supports user defined ISA extensions, optional floating point unit and advanced dynamic branch prediction.

Developers using the Bluebud platform also get a comprehensive software platform to make their job easier, built around the SenslinQ framework. There is a Bluetooth library with an LE stack (LE Audio, Sport & Fitness, etc.) combined with a classic stack (A2DP, HFP, etc). The audio software has a wide range of codecs, including the new LC3 codec. The audio software handles TWS streaming and audio sync. CEVA also has special purpose SenslinQ plug-ins for noise cancellation, voice commands and motion detection, or customers can supply or create their own. One interesting plug-in is the TinyML plug-in for TensorFlow Lite Micro.

The Bluebud platform includes an application library that supports many features that are needed in TWS. CEVA supplies libraries for easy and fast pairing, audio & call control, TWS optimized relay (including role switch to balance battery life), multi-source connections, OTA firmware update and sound effects, to name a few.

One of the key capabilities of TWS is handling two separate synchronized receivers for left and right streams. Bluetooth Classic does not offer native support for two receivers. There are several solutions on the market for working with Bluetooth classic devices. Basic forwarding is used where one receiver receives both signals and retransmits to the other side. However, it uses a lot of power, does not have high fidelity and can be unstable. Dual stream approaches, where the two channels are independently transmitted to each side, are often proprietary and require both the transmitter and receiver to be from the same supplier.

Sniffing is where both receivers listen to the transmitter but designate one side as the controller. This approach is compatible with all transmitters. However, it is complicated by technical and patent issues. CEVA on its side offers a similar performance sniffing solution with low complexity.

CEVA Bluebud is a complete high performance and high feature solution for companies developing TWS SoCs. With the fast market growth, time to market is a huge consideration. It seems that CEVA with their Bluebud platform is offering a rapid path for developing high quality, efficient and differentiated silicon. The CEVA website has presentation material, a written product brief and a white board talk on Bluebud to help designers come up to speed on their solution.

Also Read:

Expanding Role of Sensors Drives Sensor Fusion

Sensor Fusion Brings Earbuds into the Modern Age

Sensor Fusion in Hearables. A powerful complement

March 18, 2021March 21, 2021

Resistive RAM (ReRAM) Computing-in-Memory IP Macro for Machine Learning

Resistive RAM (ReRAM) Computing-in-Memory IP Macro for Machine Learning
by Tom Dillinger on 03-18-2021 at 6:00 am
Categories: Events, Foundries, TSMC

The term von Neumann bottleneck is used to denote the issue with the efficiency of the architecture that separates computational resources from data memory. The transfer of data from memory to the CPU contributes substantially to the latency, and dissipates a significant percentage of the overall energy associated with the computation.

This energy inefficiency is especially acute for the implementation of machine learning algorithms using neural networks. There is a significant research emphasis on in-memory computing, where hardware is added to the memory array in support of repetitive, vector-based data computations, reducing the latency and dissipation of data transfer to/from the memory.

In-memory computing is well-suited for machine learning inference applications. After the neural network is trained, the weights associated with the multiply-accumulate (MAC) operations at each network node are stored in the memory, and can be used directly as multiplication operands.

At the recent International Solid-State Circuits Conference (ISSCC), researchers from the National Tsing Hua University and TSMC presented several novel design implementation approaches toward in-memory computing, using resistive RAM (ReRAM). [1] Their techniques will likely help pave the way toward more efficient AI implementations, especially at the edge where latency and power dissipation are key criteria.

Background

An example of a fully-connected neural network is shown in the figure below.

A set of input data (from each sample) is presented to the network – the input layer. A series of computations is performed at each subsequent layer. In the fully-connected network illustrated above, the output computation from each node is presented to all nodes in the next layer. The final layer of the trained network is often associated with determining a classification match to the input data, from a fixed set of labeled candidates (“supervised learning”).

The typical computation performed at each node is depicted below. Each data value is multiplied by its related (trained) weight constant, then summed – a multiply-accumulate (MAC) calculation. A final (trained) bias value may be added. The output of a numeric activation function is used to provide the node output to the next layer.

The efficiency of the node computation depends strongly on the MAC operation. In-memory computing architectures attempt to eliminate the delay and power dissipation of transferring weight values for the MAC computation.

The figures above illustrate how the multiplication of (data * weight) could be implemented using the value stored in a one-transistor, one-resistor (1T1R) ReRAM bitcell. [2]

ReRAM technology offers a unique method for non-volatile storage in a memory array. A write cycle to the bitcell may change the property of the ReRAM material, between a high-resistance (HR) and low-resistance (LR) state. Subsequent to the write cycle, a bitline current-sense read cycle differentiates between the resistance values to determine the stored bit.

Again referring to the figure above, with the assumption that HR = ‘0’ and LR = ‘1’, the ReRAM cell implements the (data * weight) product in the following manner:

if the data = ‘0’, the word line to the bitcell is inactive and little bitline current flows
if the data = ‘1’ (word line active), their bitcell current will either be iHR or iLR

If the bitline current sense circuitry distinguishes between iHR (small) and iLR (large), only the product (data = ‘1’) * (weight = ‘1’) = ‘1’ results in significant bitline current.

The summation of the (data * weight) product for multiple data values into the fully-connected network node is illustrated in the figure above. Unlike a conventional memory array where only one decoded address word line is active, the in-memory computing MAC will have an active word line for each node input where (data = ‘1’). The total bitline current will be the sum of the parallel ‘dotted’ bitcell currents where the individual word lines are active, either iLR or iHR for each. The multiply-accumulate operation for all (data * weights) is readily represented as the total bitline current.

At the start of the MAC operation, assume a capacitor connected to the bitline is set to a reference voltage (say, either fully pre-charged or discharged). The clocked duration of the MAC computation will convert the specific bitline current in that clock cycle into a voltage difference on that capacitor:

delta_V = (I_bitline) * (delta_T) / Creference

That voltage can be read by an analog-to-digital converter (ADC), to provide the digital equivalent of the MAC summation.

In-Computing ReRAM Innovations

The ISSCC presentation from researchers at National Tsing Hua University and TSMC introduced several unique innovations to the challenges of ReRAM-based in-memory computing.

Data and Weight Vector Widths

The simple examples in the figures above used a one-bit data input and a one-bit weight. A real edge AI implementation will have data vector and weight vector widths as input to the MAC operation. For example, consider the case of 8-bit data and 8-bit weights for each multiplication product in the MAC operation. (Parenthetically, the vector width of the weights after network training need not be the same of the input data vector width. Further, the numeric value of the width vector could be any of a number of representations – e.g., signed or unsigned integer, twos complement.) For the example, at each network node, the in-memory computation architecture needs to compute multiple products of two 8-bit vectors and accumulate the sum.

While the ReRAM array macro computes the MAC for the network node, circuitry outside the array would be used to add the bias, and apply the activation function. This function would also normalize the width of the node output result to the input data vector width for the next network layer.

The researchers implemented a novel approach toward the MAC calculation, expanding upon the 1-bit ReRAM example shown above.

The description above indicated that the duration of the bitline current defines the output voltage on the reference capacitor.

The researchers reviewed several previous proposals for generating the data vector input-to-word line duration conversion, as illustrated below.

The input data value could be decoded into a corresponding number of individual word line pulses, as illustrated below.

Alternatively, the data value could be decoded into a word line pulse of different durations. The multiplication of the data input vector times each bit of the weight could be represented by different durations of the active word line to the ReRAM bit cell, resulting in different cumulative values of bitline current during the read cycle. The figure below illustrates the concept, for four 3-bit data inputs applied as word lines to a weight vector bitline, shown over two clock cycles.

For a data value of ‘000’, the word line would remain off; for a data value of ‘111’, the maximum word line decode pulse duration would be applied. The data input arcs to the network node would be dotted together as multiple active cells on the column bitline, as before.

Each column in the ReRAM array corresponds to one bit of the weight vector – the resulting voltage on the reference capacitor is the sum of all node data inputs times one bit of the weight.

Outside of the ReRAM array itself, support circuitry is provided to complete the binary vector (data*weight) multiplication and accumulation operation:

an ADC on each bitline column converts the voltage value to a binary vector
shifting the individual binary values for the MSB to LSB of the weight vector
generating the final MAC summation of the shifted weight bits

The researchers noted that these two approaches do not scale well to larger data vector widths:

the throughput is reduced, as longer durations are needed
for the long pulse approach, PVT variations will result in jitter in the active word line duration, impacting the accuracy

The researchers chose to implement a novel, segmented duration approach. For example, an 8-bit data input vector is divided into 3 separate ReRAM operations, of 2-3-3 bits each. The cumulative duration of these three phases is less than the full data decode approach, improving the computation throughput.

Scaling the Bitline Current

With the segmented approach, the researchers described two implementation options:

at the end of each phase, the reference capacitor voltage is sensed by the ADC, then reset for the next phase; the ADC output provides the data times weight bit product for the segmented data vector slice
the reference capacitor voltage could be held between phases, without a sample-and-reset sequence

In this second case, when transitioning from one data vector segment to the next, it is necessary to scale the capacitor current correspondingly. If the remaining data vector width for the next segment phase is n bits, the capacitor current needs to be scaled by 1/(2**n). The figure below provides a simplified view to how the researchers translated the bitline current in each phase into a scaled reference capacitor current.

A pFET current mirror circuit is used to generate a current into the reference capacitor; the unique nature of a current mirror is by adjusting device sizes in the mirror branch, scaled values of the bitline current are generated. Between the data vector segment phases, the capacitor voltage is held, and a different scaled mirror current branch is enabled.

For the in-memory ReRAM computing testsite, the researchers chose to use the full reference capacitor reset phase for the most significant bits segment, to provide the optimum accuracy, as required for the MSBs of the data input. For the remaining LSBs of the data, the subsequent phases used the switched current mirror approach.

Process Variations

The researchers acknowledged that there are significant tolerances in the high and low resistance values of each ReRAM bitcell. When using ReRAM as a simple memory array, there is sufficient margin between lowR and highR to adequately sense a stored ‘1’ and ‘0’.

However, as the in-memory computing requirements rely on accumulation of specific (dotted) bitcell currents, these variations are a greater issue. The researchers chose to use an “averaging” approach – each stored weight bit value is copied across multiple ReRAM bitcells (e.g., # of copies = 4). Although the figures above depict each data input vector as one ReRAM word line, multiple word lines connected each weight bit are used.

Testsite and FOM

TSMC fabricated an ReRAM testsite using this segmented data vector technique. The specs are shown in the figure above. The testsite provided programmability for different data vector widths and weight vector widths – e.g., 8b-8b-14b represents an eight bit data input, an eight bit weight, and a full MAC summation supporting a fourteen bit result at the network node.

The researchers defined a figure-of-merit for MAC calculations using in-memory computing:

FOM = (energy_efficiency * data_vector_width * weight_vector_width * output_vector_width) / latency

(Energy efficiency is measured in TOPS/Watt; the output vector width from the ReRAM array and support circuitry is prior to bias addition and activation/normalization.)

Summary

Edge AI implementations are hampered by the power and latency inefficiencies associated with the von Neumann bottleneck, which has sparked great interest in the field of in-memory computing approaches. Read access to a ReRAM array storing weight values offers a unique opportunity to implement a binary product of data and weights. Researchers at TSMC and National Tsing Hua University have implemented several novel approaches toward the use of ReRAM for the MAC computation at each neural network node, addressing how to efficiently work with wide data vectors, and manage ReRAM process variation. I would encourage you to read their recent technical update provided at ISSCC.

-chipguy

References

[1] Xue, Cheng-Xin, et al., “A 22nm 4Mb 8b-Precision ReRAM Computing-in-Memory Macro”, ISSCC 2021, paper 16.1.

[2] Mao, M., et al., “Optimizing Latency, Energy, and Reliability of 1T1R ReRAM Through Cross-Layer Techniques”, IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2016, p. 352-363.

March 17, 2021January 11, 2022

SPIE 2021 – ASML DUV and EUV Updates

SPIE 2021 – ASML DUV and EUV Updates
by Scotten Jones on 03-17-2021 at 10:00 am
Categories: Events, Lithography

SPIE DUV 2021 ASML NXT4 DryWet Presentation final noWPD2 Page 42

At the SPIE Advanced Lithography Conference held in February, ASML presented the latest information on their Deep Ultraviolet (DUV) and Extreme Ultraviolet (EUV) exposure systems. I recently got to interview Mike Lercel of ASML to discuss the presentations.

DUV

Despite all the attention EUV is getting, most layers are still exposed with DUV systems and this will likely continue to be true for the foreseeable future.

ASML has had two DUV platforms in production, the XT platform for dry exposure tools and the NXT platform for immersion. The NXT is the faster and more sophisticated platform.

For leading edge immersion, ASML has introduced the NXT:2050i on the fourth generation NXT platform for ArF immersion (ArFi). The new system has a new wafer handler, wafer stage, reticle stage, projection lens, laser pulse stretcher and immersion hood. This results in faster wafer to wafer sequencing, faster measurements, pellicle deflection correction and improved speckle with improved overlay. Throughput on the new system is 295 wafers per hour (wph). Longer term there are plans for a 330 wph system (see figure 1).

ASML is now taking the NXT platform and porting dry lenses onto it with the first system the NXT:1470 for ArF dry offering 300wph (slightly faster than the NXT:20250i because it does not have the immersion overhead). The 300 wph for the NXT:1470 is up from approximately 200 wph for the XT:1460K. In the future the NXT:1470 will have further throughput improvements to 330 wph (see figure 1).

There are also plans to port a KrF dry lens to the NXT platform with 330 wph planned (see figure 1).

Figure 1. NXT Roadmap.

EUV (0.33NA)

With the standard 0.33 numerical aperture (NA) systems in use at Samsung and TSMC for 7nm and 5nm logic production and at Samsung for 1z DRAM, the number of wafers exposed with EUV is growing rapidly (see figure 2).

Figure 2. 3400x EUV Systems in the Field and Wafers Exposed.

The NXE:3400C system has been shipping since late 2019 and the new NXE:3600D should start shipping later this year. Each new system provides improved throughput and overlay.

Figure 3 presents a summary of both 0.33 NA and High-NA 0.55NA systems to be discussed in the next section.

Figure 3. EUV Systems Summary.

The first column lists past, current, and future systems beginning with the NXE3400B systems that were the first production systems.
The second column provides the introduction dates for each system. Notably the new NXE:3600D should ship later this year with improved performance and the first high-NA systems should ship late 2022.
The third column presents the numerical aperture of the system with 0.33NA representing the current system and 0.55NA the high-NA system in development.
The next two columns present the throughput for 20mJ/cm² and 30mJ/cm² doses as demonstrated by ASML. These throughputs are based on 96 fields per wafer more typical of a DRAM application.
The systems shipped is IC Knowledge’s estimate of the number of NXE:3400B and NXE:3400C systems shipped by type through Q4-2020, ASML does not provide this breakout.
The next column is the current availability of approximately 85% for the NXE:3400B and approximately 90% for the NXE:3400C. The 3400C has the new modular vessel that reduces downtime. Long term ASML has a goal to reach the 95% availability typical of DUV systems.
The final column presents some comments on the systems and usage. We believe that 7nm logic production has primarily been on the 3400B and 5nm on the 3400C. We expect the 3nm processes due to enter production over the next one to two years to be primarily produced on the 3600D systems.

A key enabler for EUV of dense patterns is the availability of a pellicle, there is now a usable pellicle available. Pellicles use reduces throughput and whether a pellicle is used or not depends on the pattern density being printed. Figure 4 presents the state of EUV pellicle transmission.

Figure 4. EUV Pellicle Transmission.

There is currently some pellicle use in production.

High-NA EUV (0.55NA)

High-NA has now progressed from PowerPoint slides through engineering design to building modules and frames. The first High-NA tools (0.55NA) are expected to ship in late 2022. These EXE:5000 systems will likely be used for research and development with the EXE:5200 systems due in 2025/2026 being the first high-NA production systems (see figure 3).

The current 0.33NA systems can print down to an approximately 30nm pitch with a single exposure. There is work being done now to demonstrate 28nm and eventually 26nm lines and spaces with a single exposure. TSMC’s 5nm process currently in production has a 28nm M0 pitch and we believe this one layer may be double patterned EUV in current production while the rest of the layers that use EUV are single patterned. For TSMC’s 3nm process due to begin risk starts later this year we expect several EUV double patterned metal layers. With the current timing for 0.55NA systems to enter production estimated to be in the 2025/2026 time-frame, we may see foundry 2nm and Intel 5nm processes in production before then with extensive EUV double pattering. 055NA EUV would likely first appear in production for foundry 1.5nm processes and Intel 3nm eliminating EUV double patterning and reducing costs.

Figure 5 presents the technical value of High-NA EUV.

Figure 5. High-NA Technical Values.

One other value to 0.55NA EUV is that the higher contrast can print dense features with a much lower dose than 0.33NA EUV improving throughput (figure 3 is throughput for specific doses and does not consider dose reduction). Figure 6 illustrates the 0.55NA advantage.

Figure 6. High-NA Throughput Advantage for Dense Patterns.

There is also work being done on improved EUV mask absorber layers to improve contrast and resolution, see figure 7. and improve photoresists, see figure 8.

Figure 7. Improved Mask Absorber Layers.

Figure 8. Improved Photoresist.

Currently modules and frames for high-NA tools are being fabricated.

Conclusion

ASML continues to drive throughput and resolution across their entire portfolio of DUV and EUV systems. With High-NA system manufacturing underway, a path to 1.5nm logic and beyond is underway.

Related Lithography Posts