DAC2025 SemiWiki 800x100

Podcast EP87: How Axiomise Addresses the Verification Challenge

Podcast EP87: How Axiomise Addresses the Verification Challenge
by Daniel Nenni on 06-16-2022 at 10:00 am

Dan is joined by GD Bansal, COO at Axiomise.  Dan explores the Axiomise business model to provide training and consulting services for formal verification with GD. The benefits and challenges of using formal verification on complex designs are discussed, along with the benefits of the Axiomize vendor-neutral approach to deploying state-of-the-art tools.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


HLS in a Stanford Edge ML Accelerator Design

HLS in a Stanford Edge ML Accelerator Design
by Bernard Murphy on 06-16-2022 at 6:00 am

AI for stanford min

I wrote recently about Siemens EDA’s philosophy on designing quality in from the outset, rather than trying to verify it in. The first step is moving up the level of abstraction for design. They mentioned the advantages of HLS in this respect and I refined that to “for DSP-centric applications”. A Stanford group recently presented at a Siemens EDA-hosted webinar, extending this range to building ML accelerators for the edge. Their architecture is built around several innovations, also an enthusiastic endorsement of the value of HLS in designing the accelerator core.

Key Innovations

Karthik Prabhu, a doctoral candidate in EE at Stanford, presented their Chimera SoC, with a goal to support training at the edge with excellent performance yet still at edge-like low-power. For this purpose their design uses resistive RAM (RRAM) for weight, eliminating need to go off-chip for this data. The SoC architecture supports scale-out to multiple chips, something they call an Illusion system, with chip-to-chip interfacing (protocol not mentioned). I would imagine this might be even more effective in a multi-chiplet implementation, but as a proof of concept I’m sure the multi-chip version is enough.

For ResNet-18 with ImageNet they measured energy at 8.1 mJ/image, latency at 60 ms/image, average power at 136 mW and efficiency at 2.2 TOPS/W. Given that the intent is support on-chip training, they do note RRAM drawback in high write-energy required and relatively low write endurance. The tests they apply seem to converge on training within the endurance bound, however they didn’t mention how they overcome the energy issue in training.

Architecting the accelerator core

This section could have been taken direct from an earlier Siemens EDA tutorial. The team started with a convolution algorithm (6 nested loops in this case) over input activations, weights and output activations. Their goal was to map that to a systolic array of processors, considering many possible variables in the architecture. How many PEs they might need in the array, how many levels in memory hierarchy, and how should they size the buffers in that hierarchy? In data optimization, were they going to prefer weight stationary, output stationary or row stationary?

They used Interstellar to optimize architecture. This is an open-source tool for design space exploration of CNN accelerator architectures, also from Stanford. I think this is pretty cool. They input a neural net basic spec (layers in network and tensor sizes), a range of memory sizes to explore, along with cost info for a MAC, a register file and a memory. Based on this input, Interstellar told them they should use a 16×16 systolic array with a 9-wide vector inside each PE. They needed a 16KB input buffer, no weight buffer and a 32KB accumulation buffer. And many more details!

Implementation

The Chimera team used Catapult to implement the accelerator, which they were able to accomplish in 2-3 months. This was a timeframe they reasonably argued would not have been possible if they were implementing in RTL. They also stressed another advantage – they made heavy use of C++ templates to parametrize much of the implementation. Simplifying adjusting implementation details, from buffer sizes to changing how weights were distributed to reduce wiring congestion. This level of parametrization also made it easy to reuse the implementation for follow-on designs.

There’s a nice description of the verification flow. All the test development was at the C++ level, allowing for fast testing; a 10 second simulation in C++ versus a 1-hour parallelized simulation in RTL. (Catapult also generated the infrastructure to map this to RTL testing.) They caught almost all bugs at C++ and could experiment with design tweaks given the fast turn-around. This also allowed them to verify training, requiring many samples to run through the design. C++-based simulation made this possible.

An interesting bottom line to this work is that they implemented Chimera in 40nm (I’m guessing for the RRAM support?) A comparison SoC, implemented in 16nm, shows higher core energy and about the same energy and latency/image. Not bad! All in all, a useful validation from an obviously credible academic research source. You can watch the session HERE.


Seeing 1/f noise more accurately

Seeing 1/f noise more accurately
by Don Dingee on 06-15-2022 at 10:00 am

Decimation chain speeds up measurements for 1/f noise

Electronics noise is often described as “white,” spread evenly across a band, typical on older semiconductor processes where thermal and shot noise dominate. As transistors shrink, “pink” 1/f noise takes over at low frequencies – becoming stronger in advanced processes and quantum computing technology. But it’s not an easy thing to characterize. Measurement time is bound by slow sampling at low frequencies, while other noise sources factor in across wider device bandwidths. Now, there’s a new approach to seeing 1/f noise more accurately.

The shape of noise depends on its source

In frequency domain over a narrow bandwidth, white noise may look flat – thus the term noise floor. But stretch the bandwidth out from near-DC to high frequencies, and noise takes on a shape, with different contributions from different types of noise. Contributors include:

  • Thermal noise comes from the Brownian motion of electrons through resistance, showing up as white noise extending across the analog bandwidth.
  • Shot noise happens when electrons flow discontinuously between semiconductor P-N junctions, adding to the white noise profile.
  • Random telegraph noise (RTN), or burst noise, comes from small voltage or current transitions due to charge trapping. RTN shows up with 1/f2 power spectral density at near-DC frequencies, sometimes called “red” noise.
  • 1/f noise is also caused by charge trapping and is usually more prominent than other sources if manufacturing process quality is high. While strongest at near-DC frequencies, 1/f noise can add noise energy up to a corner frequency.

Additionally, two external noise sources can affect 1/f noise measurements. The first is chamber noise, usually optimized by probe station manufacturers. The second is environmental noise, from sources such as AC power line noise, ground loops in cabling, and nearby equipment – all mitigatable through electromechanical best practices.

Decimation chain speeds up 1/f noise measurements

An OEM’s system noise floor profile affects metrics like signal-to-noise ratio (SNR), receiver sensitivity, and error vector magnitude (EVM). For a low system noise floor, low 1/f noise components are a must. Both passive and active device manufacturers can turn to a low frequency noise analyzer for characterizing parts, improving their performance, and extracting device models for customer use in system simulations.

1/f noise measurement has been a different process from wideband noise measurement. One reason is the sheer amount of time a 1/f measurement takes at low frequencies. A good measurement requires a lot of samples and averaging, and as analysis frequencies drop, sample times turn into minutes or longer. This pushed many manufacturers into two different noise measurement instruments, one for low frequencies, one for a wider analog bandwidth. But this approach has twice the test setups and disparate test data from instruments with different noise floors, settings, and algorithms.

One instrument, one test setup, and complete noise results from near-DC to the maximum analog bandwidth sounds great. The question is, how? Keysight looked at the problem differently and came up with the idea of a decimation chain. In short, it takes one set of samples and decimates down to the lowest frequency band. Instead of resampling data for higher frequency bands, it reuses the same samples and runs decimation, FFTs, and averaging on bands in parallel. The result is a solid 1/f noise measurement with major time savings and no reduction in quality.

Turnkey full wafer-level characterization

One core feature of the Keysight E4727B A-LFNA is its ability to see very low noise in devices, such as MOSFET linear region noise. With a system noise floor of 1e-28A^2/Hz and a 1/f noise corner frequency of 15 Hz, plus a maximum analog bandwidth of 100 MHz, the E4727B improves noise characterization. Combining a low noise floor with a wider analysis bandwidth and faster measurement speed makes the E4727B a world-class solution.

When paired with the Keysight WaferPro Express measurement platform and a wafer probing solution like the CM300xi-ULN from FormFactor, the E4727B A-LFNA provides turnkey measurement of DC characteristics, 1/f noise and RTN, and data analysis. Applications include mass characterization of noise specifications for devices on-wafer before packaging, and manufacturing statistical process control.

Another important application is developing process design kits (PDKs). Many firms designing low noise semiconductors are fabless, turning to foundries for wafer fabrication services. The transistors in those foundry processes aren’t under designer control – they’re offered as part of libraries chip designers can choose from. Understanding their characteristics is vital to design success. Adding the Keysight PathWave Model Builder (MBP) or PathWave Device Modeling (IC-CAP) to the suite automates 1/f noise model parameter extraction and model library generation. A foundry can characterize transistors and other primitives in their processes and make 1/f noise data and models available for customers during their evaluation and design workflow.

Seeing 1/f noise more accurately takes on greater importance as transistors get smaller and faster on advanced process nodes. The idea that wafer-level characterization can screen large numbers of devices quickly is powerful. Plus, savings in measurement speed and test setups from using a Keysight E4727B A-LFNA translates to more wafer throughput.


Truechip’s Network-on-Chip (NoC) Silicon IP

Truechip’s Network-on-Chip (NoC) Silicon IP
by Kalar Rajendiran on 06-14-2022 at 10:00 am

Truechip NoC Silicon IP Block Diagram

Driven by the need to rapidly move data across a chip, the NoC IP is already a very common structure for moving data with an SoC. And various implementations of the NoC IP are available in the market depending on the end system requirements. Over the last few years, the RISC-V architecture and the TileLink interface specification have been gaining broad adoption.  While the TileLink specification was originally developed to work with the RISC-V architecture, it actually supports other instruction set architectures (ISAs) too. The conjunction of these trends has created a need for a NoC IP to work with the TileLink protocol.

A recent SemiWiki post discussed the DisplayPort VIP solution from Truechip, an IP company that has been serving customers for more than a decade. While Truechip has established itself as a global provider of verification IP (VIP) solutions, they are always on the lookout for strategic IP needs from their customer base. Truechip has seized the above strategic NoC IP opportunity to develop a design IP targeting RISC-V based chips supporting the TileLink interface specification. Since its introduction to the market last year, this IP has been gaining a lot of adoption within Truechip’s customer base. While this is their first design IP addition to their product offering, we can expect to see more strategic additions in the future.

Truechip’s NoC Silicon IP

Truechip’s NoC silicon IP’s target applications are RISC-V based chip system implementations leveraging the TileLink specification. The IP provides chip architects and designers with an efficient way to connect multiple TileLink based master and slave devices for reduced latency, power, and area. And of course, it helps reduce physical interconnect routing and use of resources inside an SoC. The solution is offered in native Verilog. Truechip’s unique RTL coding technique has yielded a high quality IP that offers low latency, high throughput and takes very little silicon area. While the current version supports the TileLink Uncached Lightweight (TL-UL) and TileLink Uncached Heavyweight (TL-UH) conformance levels, the next version will include support for TL-C (cache coherency) conformance level.

Some Salient Features

  • Supports N master and M slave ports as per customer requirements
  • Supports wide range of memory map
  • Supports both little endianness and big endianness
  • Supports both the TL-UL and TL-UH conformance levels
  • Supports all TileLink networks that follow a directed acyclic graph (DAG)
  • Supports configurable widths of various parameters of data and address bus
  • Supports all types of operations per conformance levels
    • Access
    • Hint
    • Transfer
  • Can work as any node of a graph tree
    • Nothing
    • Trunk
    • Tip (with no branches)
    • Tip (with branches)
    • Branch

Deliverables

NoC Silicon IP in RTL form

Testbench and Sanity Tests

User Manual and Integration guide

Quick start guide

TruEye Tool for debug (optional)

Full Verification IP for TileLink (optional)

About Truechip

Truechip, the Verification IP specialist, is a leading provider of Design and Verification solutions. It has been serving customers for more than a decade. Its solutions help accelerate the design cycle, lowers the cost of development and reduces the risks associated with the development of ASICs, FPGAs, and SoCs. The company has a global footprint with sales coverage across North America, Europe and Asia. Truechip provides the industry’s first 24×7 support model with specialization in VIP integration, customization and SoC Verification.

For more information, refer to Truechip website.

Also Read:

LIDAR-based SLAM, What’s New in Autonomous Navigation

Die-to-Die IP enabling the path to the future of Chiplets Ecosystem

Very Short Reach (VSR) Connectivity for Optical Modules


A Different Perspective: Ansys’ View on the Central Issues Driving EDA Today

A Different Perspective: Ansys’ View on the Central Issues Driving EDA Today
by John Lee on 06-14-2022 at 6:00 am

RedHawk SC uses Ansys SeaScape Big Data Platform Designed for EDA Applications

For the past few decades, System-on-Chip (SoC) has been the gold standard for optimizing the performance and cost of electronic systems. Pulling together practically all of a smartphone’s digital and analog capabilities into a monolithic chip, the mobile application processor serves as a near-perfect example of an SoC. But today’s leading integrated circuits (IC) are pushing up against the upper limit of a chip’s physical size, which is limited by the manufacturing equipment’s optical reticle size. This has proven difficult to increase and has grown only slowly over the years. Yet market pressure continues unabated for bigger, more capable electronic systems with more integrated memory, more digital logic, and more analog/mixed signal circuitry. This tension is driving some significant business and technology trends in EDA that will reshape the market in the coming years.

The Four Engines Driving Semiconductor Design
The road forward has plenty of challenges and we are seeing design companies making significant efforts to adapt and come to grips with the following four technology and market drivers:

  • The requirement for concurrent multiphysics analysis to ensure reliable and efficient electronic systems
  • The blurring of the lines between chip, package, and system
  • The need for open, extensible, and inclusive platforms that interoperate with the full range of tools required to solve today’s multiphysics designs
  • Bespoke silicon as the major driver for EDA at hyperscalers and system companies

Blurring of Silicon and System Design
The advent of 3D-IC opens up new horizons for solutions that can be implemented in silicon. But it also forces a closer integration between three distinct technology markets that have co-existed symbiotically for many decades: IC design, package design, and printed circuit board (PCB) design. These markets use different tools, different data formats, different manufacturing back-ends, operate at different computational and geometric scales, and focus on different physical concerns. Yet, emerging 2.5D/3D-IC technology combines many aspects of all three: It features heterogeneous silicon die but also board-like substrates and interposers that stitch the chips together. The collapse of all this expertise into a single project is requiring companies to re-imagine their design capabilities and flows, as well as their organizational structure.

Open, Extensible, Multiphysics Platforms
The siloed isolation of chip design from PCB design and package design means that each of these markets has developed insular data structures that are ill-suited to deal with the breadth of multiphysics analysis for 3D-IC design. Many different physical disciplines – including computational fluid dynamics, mechanical stress, and electromagnetic radiation – are all needed to solve the multiphysics challenge. No one company offers the entire range of required tools, so we see the need for open multiphysics platforms that allow easy data exchange and tool integration. A crucial factor for advanced users is the ability to customize their design flow around these platforms with popular extension languages like Python. And, finally, there is the issue of tool capacity to handle the enormous size of modern silicon systems. EDA platforms must embrace the modern cloud compute paradigm that enable realistic analysis in a time of relevance.

Bespoke Chips
Today’s market-leading companies are heavily dependent on technology for their continued success and market differentiation. Silicon systems are now so powerful and central that their performance can shift the needle for entire business divisions. Everybody from online retailers to telecommunications to social networking companies and hyperscalers are moving away from off-the-shelf solutions and turning to custom-built silicon to give them an edge. Many of these companies are seeking to gain market share by leveraging proprietary AI/ML algorithms trained on their extensive troves of market data – but this requires yet greater amounts of compute power and specialized chips. Access to high-quality silicon solutions is vital in today’s world and there is strong demand for continually more complex and powerful electronics.

3D-IC an Inflection Point in Electronic Design
3D-IC design is recognized as an inflection point in electronic design and presents major challenges that are realigning the electronic design industry around this new reality.

The key technology breakthrough of 3D-IC is that it makes it possible to spread a system out over multiple chips – moving the industry away from the traditional monolithic SoC approach. By abandoning the need to integrate an entire system on a single SoC and instead allowing it to be disaggregated over multiple chips, 3D-IC enables Moore’s Law to break through the reticle size barrier, improves yield by shrinking the size of individual chips, and makes it possible to mix different process technologies optimized for each function.

Summary
The four trends outlined above are deeply interconnected and mutually reinforcing. We believe that they give a perspective for EDA innovation over the coming years and show a path forward for all stake holders in the electronic design market to align their development priorities to take advantage of the incredible technical opportunities that are available to us.

About John Lee
John Lee is general manager and vice president of the Ansys Electronics and Semiconductor Business Unit. Lee co-founded and served as CEO of Gear Design Solutions (now Ansys), developer of the first purpose-built big data platform for integrated circuit design. He cofounded two other startups (Mojave Design and Performance Signal Integrity), which successfully exited into companies now part of Synopsys. He holds undergraduate and graduate degrees from Carnegie Mellon University.

Also Read:

Unlock first-time-right complex photonic integrated circuits

Take a Leap of Certainty at DAC 2022

Bespoke Silicon is Coming, Absolutely!


Podcast EP86: Negative Outlook for the Semiconductor Industry with Malcolm Penn

Podcast EP86: Negative Outlook for the Semiconductor Industry with Malcolm Penn
by Daniel Nenni on 06-13-2022 at 10:00 am

Dan is joined by Malcolm Penn, founder and CEO of Future Horizons, a firm that provides industry analysis and consulting services on the global semiconductor industry.

Dan and Malcolm discuss the current and future state of the semiconductor industry. What has driven the cyclic nature of the business and are we doomed to repeat these cycles? Will the industry shrink or grow over the next few years, and what are the factors that will shape these outcomes?

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Intel 4 Deep Dive

Intel 4 Deep Dive
by Scotten Jones on 06-13-2022 at 6:00 am

Figure 1

As I previously wrote about here, Intel is presenting their Intel 4 process at the VLSI Technology conference. Last Wednesday Bernhard Sell (Ben) from Intel gave the press a briefing on the process and provided us with early access to the paper (embargoed until Sunday 6/12).

“Intel 4 CMOS Technology Featuring Advanced FinFET Transistors optimized for High Density and High-Performance Computing,”

The first thing I want to discuss is the quality of this paper. This paper is an excellent example of a well written technical paper describing a process technology. The paper includes the critical pitches needed to judge process density, the performance data is presented on plots with real units and the discussion provides useful information on the process. I bring this up because at IEDM in 2019 TSMC published a paper on their 5nm technology that had no pitches, and all the performance plots were normalized without real units. In my view that was a marketing paper not a technical paper. At the conference press luncheon, I asked the organizing committee if they considered rejecting the paper due to the lack of content and they said they had but ultimately decided 5nm was too important.

Intel has disclosed a roadmap for the next four nodes (Intel 4, 3, 20A, and 18A) with dates, device types, and performance improvement targets. They are now filling in more detail on Intel 4. In contrast, Samsung is in risk starts on their 3nm and have disclosed PPA (Power, Performance and Area) targets but no other details, for 2nm they have disclosed that it will be their third generation Gate All Around (GAA) technology due in 2025 but no performance targets. TSMC has disclosed PPA for 3nm that is currently in risk starts and for 2nm a risk-start date has been disclosed but no information on performance, or device type.

Intel 4 Use Target

Before getting into the details on Intel 4, I want to comment on the target for this process. As we went through the details it became clear this process is targeted at Intel internal use to manufacture compute tiles, it is not a general use foundry process. Intel 4 is due late this year and Intel 3 is due next year; Intel 3 is the focus for Intel Foundry Services. Specifically, Intel 4 does not have I/O fins because they are not needed on a compute tile that is going to communicate solely with other chips on a substrate and Intel 4 only offers high performance cells and does not have high density cells. Intel 3 will offer both I/O fins and high-density cells as well as more EUV use and better transistors and interconnect. Intel 3 is designed to be an easy port from Intel 4.

Density

Anyone who has read my previous articles and comparisons knows I put a lot of emphasis on density. In figure 1 of the Intel 4 article, they disclose critical pitches for Intel 4 and compare it to Intel 7, see figure 1.

Figure 1. Intel 4 Versus 7 Pitches.

 The high-performance cell height (CH) for Intel 7 is 408nm and for Intel 4 is 240nm. The Contacted Poly Pitch (CPP) for Intel 7 is 60 and for Intel 4 is 50, the product of CH and CPP for Intel 7 is 24,480nm2 and for Intel 4 is 12,000nm2, providing an ~2x density improvement for high performance cells. Intel 4 also provides a 20% performance per wall improvement versus Intel 7 and high density SRAMs are scaled by 0.77x.

To put this density improvement in context it is useful to better understand Intel’s recent process progression. Figure 2 summarizes four generations of Intel’s 10nm process.

Figure 2. Intel 10nm Generations.

IC Knowledge has a strategic partnership with TechInsights, we believe them to be the best in the world at construction analysis of leading-edge semiconductors. TechInsights first analyzed Intel 10nm in July 2018 and refers to this as generation 1, TechInsights completed another 10nm analysis in December 2019 finding the same density but a different fin structure leading them to refer to this as generation 2. In January 2021 TechInsights analyzed the 10nm Super Fin parts that offers a 60nm CPP option for performance along with the original 54nm CPP (generation 3). Finally in January 2022 TechInsights analyzed a 10nm enhanced Super Fin part, what Intel now calls Intel 7 (10nm generation 4). One interesting thing to me about the result of the Intel 7 analysis is TechInsights only found 60nm CPP in the logic area, no 54nm CPP and taller cells.

My policy for characterizing process density is to base it on the densest cell available on the process. For Intel 7 a 54nm CPP cell 272nm high is “available” but not used and the 408nm high cell with a 60nm CPP yields a transistor density of ~65 million transistor per millimeter squared (Mtx/mm2) versus ~100 MTx/mm2 for earlier generations. So how do we place Intel 4 versus prior generation processes and the forthcoming Intel 3 process, see figure 3.

Figure 3. Intel Density Comparison.

 In figure 3 I have presented high-density and high-performance cell density separately. Intel 4 is ~2x the high-performance cell density of intel 7 as Intel has disclosed. Intel 3 is supposed to have “denser” libraries versus Intel 4. If I assume the same pitches but a smaller track height for Intel 3, I get ~1.07x denser high-performance cells and ~1.4x denser high-density cells versus Intel 10/7.

Another interesting comparison is Intel 4 high-performance cell size versus TSMC high performance cell sizes for 5nm and 3nm, see figure 4.

Figure 4. Intel 4 versus TSMC N3 and N5 High-Performance Cells.

TSMC N5 has a 51nm CPP and 34nm M2P with a 9.00 track high-performance cell that yields a 306nm CH and a 15,606nm2 CPP x CH. We believe TSMC N3 has a 45nm CPP and 28nm M2P, and for a 9.00 track high-performance cell that yields a CH of 252nm and a CPP x CH of 11,340nm2. For Intel 4 the CPP is 50nm and M2P is 45nm (disclosed in the briefing although not in the paper), this yields a tracks height of only 5.33 for the quoted 240nm CH and a CPP x CH of 12,000nm2. These values are consistent with a 4 designation since it slots between N5 and N3 for the leading foundry company TSMC, although it is closer to TSMC N3 than TSMC N5. We also believe Intel 4 will have performance slightly better than TSMC N3. I didn’t include Samsung in Figure 4 but based on my current estimates Intel 4 is denser than Samsung GAE3. Samsung may have a small performance advantage over Intel 4 and TSMC N3, but Intel 3 should surpass both Samsung GAE3 and TSMC N3 for performance next year.

I am surprised that Intel’s high-performance cell works out to just over 5-tracks in height but that is the math for the disclosed cell height and M2P.

DTCO

From a Design-Technology-Co-Optimization (DTCO) perspective Intel 4 has 3 improvements over Intel 7:

  1. Contact Over Active Gate is optimized for Intel 4.
  2. Diffusion break by dummy gate removal used to need two dummy gates (double diffusion break), Intel 7 went to 1 (single diffusion break).
  3. The n to p spacing used to be two fin pitches and is now 1 fin pitch. When we talk about CH in terms of M2P and tracks it is easy to forget that the devices have to fit into that same height and figure 5 illustrates how n to p spacing contributes to cell height.

Figure 5. Cell Height (CH) Scaling.

During the briefing Q&A there was a question about cost per transistor and Ben said that cost per transistor went down for Intel 4 versus Intel 7.

Performance

Intel 10/7 offered 2 threshold voltage (2 PMOS and 2 NMOS = 4 total) and 3 threshold voltage (3 PMOS and 3 NMOS = 6 total) versions. Intel 4 provides 4 threshold voltages (4 PMOS and 3 NMOS = 8 total). This results in ~40% lower power and ~20% higher performance.

I believe the drive current values mentioned during the briefing are 2mA/μm for PMOS and 2.5mA/μm for NMOS.

EUV usage

EUV is used in both the backend and front end of the process. Intel has focused EUV use on where a single EUV exposure can replace multiple immersion exposures. Even though an EUV exposure is more expensive than an immersion exposure, replacing multiple immersion exposures with associated deposition and etch steps can save cost, improves cycle time and yield. In fact Ben mentioned single EUV exposures resulted in 3-5x fewer steps in the sections that EUV replaced. Intel 7 to Intel 4 see a reduction in masks and step count. In the front end of line EUV is focused on replacing complicated cuts, gate or contact. Intel didn’t explicitly disclose that EUV is used in fin patterning but we believe for Intel 7 fin patterning involved a mandrel mask (Intel calls this a grating mask) and 3 cut masks (Intel calls these collection masks). For Intel 4 this could easily have transitioned to 4 cut masks. Without naming the layer replacing 4 cut masks with a single EUV mask was mentioned and we believe this could be where that happens.

In the paper Intel mentions that M0 is quadruple patterned. For Intel 10/7 Intel also disclosed quadruple patterning and TechInsights analysis showed that 3 block masks were needed. It is possible that Intel 4 would need 4 block masks for M0 and this may be another place where EUV eliminate 4 cut/block masks.

A gridded layout was used for interconnect to improve yield and performance.

We believe there are ~12 EUV exposures used in this process, but this was not disclosed by Intel.

Interconnect

It is well known that Intel went to cobalt (Co) for M0 and M1 at 10nm. Co offers better electromigration resistance than copper (Cu) but higher resistance (Authors note, electromigration resistance of a metal is proportional to melting point). For Intel 4, Intel has gone to an “enhanced” Cu scheme where pure Cu is encased in Co (in the past Intel doped the Cu). A typical flow to encapsulate Cu in Co is to put down a barrier layer with a Co layer to serve as the seed for plating. Once plating is complete and planarized to form an interconnect the Cu is capped with Co. This process results in slightly degraded electromigration resistance versus Co but still above the 10-year lifetime goal and the resistance of the line is reduced. In fact, even though the interconnect lines are narrower for Intel 4 versus Intel 7, the RC value are maintained.

The process has 5 enhanced copper layers, 2 giant metal layers and 11 “standard” metal layers for a total of 18 layers.

MIM caps

With the increasing importance of power delivery Metal-Insulator-Metal (MIM) capacitors are used to reduce power swings and have undergone continuous improvement. For Intel’s 14nm process 37 fF/μm2 was achieved, this improved to 141 fF/μm2 for 10nm, 193 fF/μm2 for intel 7 and has now been increased ~2x to 376 fF/μm2 for Intel 4. Higher values enable MIM capacitors with more capacitance improving power stability without taking up excess space.

Where they went wrong

During the Q&A Ben was asked where Intel went wrong in the past, he said that in the past Intel tried to do too much at once (authors note, for example Intel 22nm to 14nm was a 2.4x density increase and then 14nm to 10nm was a 2.7x density increase, see figure 3. Intel has now adopted a modular approach where you can separately develop modules and deliver more performance, more quickly.

When asked what he was most proud of, he said achieving yield and performance with library scaling and the process looks good in factories. The process is simpler with EUV improving yield and reducing registration issues.

Production sites

Also during the Q&A Ben was asked about production sites. He said initial production will be in Hillsboro followed by Ireland. He said they haven’t disclosed additional production plans beyond that.

In our own analysis of EUV availability we have published here that EUV exposure tools will be in short supply for the next few years. This is also consistent with Pat Gelsinger discussing tool shortages for Intel’s new fabs. We believe EUV tool availability will gate Intel’s fab ramp. Furthermore we believe Intel has ~10 to 12 EUV tools presently and until recently they were all in Hillsboro. One of those tools has now been moved to Fab 34 in Ireland and we believe that as intel receives further EUV tools this year they will be able to ramp Fab 34 up. Late this year we expect Fab 38 in Israel to begin ramping and our belief is that will be the next Intel 4/3 production site. Following that in the later part of 2023, Fabs 52 and 62 in Arizona should start receiving EUV tools. We also believe most of this capacity will be needed for Intel’s own internal use and they will have limited EUV based foundry capacity until the 2024/2025 timeframe.

Yield and Readiness

Throughout the briefing everything we heard about yield is that it is “healthy” and “on schedule”. Meteor Lake compute tiles are up and running on the process. The process is ready for product in the second half of next year.

Conclusion

I am very impressed with this process. The more I compare it to offerings from TSMC and Samsung the more impressed I am. Intel was the leader in logic process technology during the 2000s and early 2010s before Samsung and TSMC pulled ahead with superior execution. If Intel continues on-track and releases Intel 3 next year they will have a foundry process that is competitive on density and possibly the leader on performance. Intel has also laid out a roadmap for Intel 20A and 18A in 2024. Samsung and TSMC are both due to introduce 2nm processes in the 2024/2025 time frame and they will need to provide significant improvement over their 3nm processes to keep pace with Intel.

Also Read:

An Update on In-Line Wafer Inspection Technology

0.55 High-NA Lithography Update

Intel and the EUV Shortage


Podcast EP85: How Expedera is Revolutionizing AI Deployment

Podcast EP85: How Expedera is Revolutionizing AI Deployment
by Daniel Nenni on 06-10-2022 at 10:00 am

Dan is joined by Sharad Chole, chief scientist & co-founder at Expedera. Sharad is an expert in AI frameworks, power-aware neural network optimizations, and programmable dataflow architectures. Previously, he was an architect at Cisco, Memoir Systems, and Microsoft.

Dan and Shared explore Expedera’s unique AI accelerator architecture. Sharad provides a broad overview of the various challenges of AI deployment and how Expedera is changing the landscape.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


WEBINAR: 5G is moving to a new and Open Platform O-RAN or Open Radio Access Network

WEBINAR: 5G is moving to a new and Open Platform O-RAN or Open Radio Access Network
by Daniel Nenni on 06-10-2022 at 6:00 am

The demands of 5G requires new designs to not only save power but also increase performance and by moving to advance power-saving nodes and by using eFPGAs will help to achieve these goals. This paper will introduce 5G and O-RAN, the complexity of these systems, and how flexibility could be beneficial. Then we will dive into how eFPGA can save power, cost and increase flexibility. By providing some examples of how eFPGA can be used for reconfigurability, it can also deliver to customers a flexible platform for carrier personalization with less power.

Watch the replay here

5G is known as a faster mobile phone experience but it is so much more.  The changes include a 90% reduction in network energy, 1-millisecond latency, 10-year battery life for IoT devices, 100x more connected devices, 1000x more bandwidth and many others.  These changes not only impact mobile devices but there are many other devices envisioned to connect to a 5G network across a large span of frequencies.  These 5G New Radios (NR) will operate from below 1G to 100GHz supplying data to many different services.

The understood use case is Enhanced Mobile Broadband (eMBB) which includes enhance data rates, reduced latency, higher user density, more capacity and coverage of mobile devices.  A better mobile phone experience (Fig. 1)

Fig 1- 5G use cases based on channel frequency used

Other applications will leverage the lower frequency channels and are referred to as Ultra-Reliable Low-Latency Communications (URLLC.)   These devices require ultra-reliability, very low latency and high availability for vehicular communication, industrial control, factory automation, remote surgery, smart grids and public safety.

On the other end of the frequency spectrum, we have Massive Machine-Type Communications (mMTC.) The devices taking advantage of this very high frequency will be communication of low-cost, massive in number, battery-powered devices such as smart metering, logistics, field and body sensors.  These devices will be on for a very short time, burst data and then shut down using very little power.

All these new devices and applications will need many 5G New Radios to serve them and a lot of equipment needs to be installed and tested.  One proposal, to help speed this is to make the interfaces between the New Radio and the Distributed Unit (DU) open which is called Open Radio Access Network or O-RAN for short (fig 2.) where the DU is virtualized in the cloud on standard servers that can be bought off the shelf.

This allows the possibility of having more than one provider for the RAN and mixing with different backends.  There will also be many different networks with different Radio Units for Macro sites, Micro sites and Pico sites.  The combinations could be endless.

This transition is paved with many good intentions and uncertainty.  Although based on “enhanced CPRI” or “eCPRI” there are unknown sideband signals or custom commands.  Learn more about how eFPGA can help this transition and other 5G applications for eFPGA to save cost, power, and reduce latency by joining this webinar.

Watch the replay here

Also Read:

Why Software Rules AI Success at the Edge

High Efficiency Edge Vision Processing Based on Dynamically Reconfigurable TPU Technology

A Flexible and Efficient Edge-AI Solution Using InferX X1 and InferX SDK