SemiWiki – Page 359 – The Open Forum for Semiconductor Professionals

March 9, 2020July 18, 2025

Achieving Design Robustness in Signoff for Advanced Node Digital Designs

Achieving Design Robustness in Signoff for Advanced Node Digital Designs
by Mike Gianfagna on 03-09-2020 at 10:00 am
Categories: EDA, Events, Synopsys

I had the opportunity to preview an upcoming webinar on SemiWiki that deals with design robustness for signoff regarding advanced node digital designs (think single-digit nanometers). “Design robustness” is a key term – it refers to high quality, high yielding SoCs that come up quickly and reliably in the target system. We all know how difficult that can be at the cutting edge. Presented by Synopsys, the webinar explores strategies to make the process of hierarchical extraction and timing analysis (StarRC) for advanced node designs more accurate and efficient.

Hierarchical design is a key element in the “divide and conquer” approach to dealing with large amounts of complex design data. But there are some very real and challenging problems to overcome to use this technique effectively for advanced node designs. Here are just a few of them:

For extraction, you need to consider capacitance interactions across hierarchical boundaries. For example, near-the-block or over-the-block top level routes and block-to-block coupling
For process variation, you need to consider boundary nets that are impacted by nets running close to the boundary at the top level
Due to CMP, block instances may have unique layout environment densities that need to be accounted for
There may be multiple physical ports at the block level that map to one logical port. Extraction and timing flows need to correctly map physical and logical pins
The number of process, temperature, via resistance and other corners is exploding. You need a way to process all these cases efficiently
Also, regarding corner analysis, typical foundry data varies metal thickness in the same direction. This is not realistic in many cases, where metal thickness can vary across the chip. Not modeling this effect can miss timing violations

If you are engaged in advanced node design, I highly recommend attending this webinar. You will learn about approaches to deal with all of items above and more. You’ll learn about new approaches to optimize the run-time of the required tools as well. There is also a very useful Q&A session that dives into a lot more detail. All of this is covered in just over 30 minutes.

The webinar presenter is Omar Shah, who has 20 years of experience working on post-layout digital and custom design flows. Sign up now to attend this webinar. The webinar will be broadcast on Tuesday, March 24, 2020 from 10:00 AM – 11:00 AM PDT. Hand sanitizer and face mask not required.

About StarRC
StarRC™ is the EDA industry’s gold standard for parasitic extraction. A key component of Synopsys Design Platform, it provides a silicon accurate and high-performance extraction solution for SoC, custom digital, analog/mixed-signal and memory IC designs. StarRC offers modeling of physical effects for advanced process technologies, including FinFET technologies at 16 nm, 14 nm, 10 nm, 7 nm, and beyond. Its seamless integration with industry standard digital and custom implementation systems, timing, signal integrity, power, physical verification and circuit simulation flows delivers unmatched ease-of-use and productivity to speed design closure and signoff verification.

Also Read:

Navigating Memory Choices for Your Next Low-Power Design

Hybrid Verification for Deep Sequential Convergence

Edge Computing – The Critical Middle Ground

March 9, 2020August 27, 2020

Six Automated Steps to Design Partitioning for Multi-FPGA Prototyping Boards

Six Automated Steps to Design Partitioning for Multi-FPGA Prototyping Boards
by Daniel Nenni on 03-09-2020 at 6:00 am
Categories: Aldec, EDA, Events, Prototyping

Before starting your next FPGA Prototyping Project you should catch the next SemiWiki webinar – “Six Automated Steps to Design Partitioning for Multi-FPGA Prototyping Boards”, in partnership with Aldec.

A significant portion of my 30+ years in the EDA industry has revolved around design verification with some form of FPGA prototyping, and the verification challenges facing SoC developers haven’t changed much in concept.

However, in today’s world, the cost of failure is much higher and the verification complexity has skyrocketed. Today’s SoC designers have a plethora of available verification options from formal to simulation to emulation and FPGA prototyping, and most advanced design teams employ some amount of each of these techniques to get designs right before tapeout. When verification speed is critical you are pretty much forced to include FPGA prototyping. Emulation is the right choice for high speed debug up to about 1 MHz, but if you need to run at 20 MHz or 100+ MHz to cover your verification space, confirm video streams, or early hardware-dependent software verification, you should seriously look into to adding FPGA prototyping to your verification hierarchy.

This SemiWiki webinar is an excellent overview of the issues facing SoC designers who need to build FPGA prototypes that must be partitioned across multiple FPGAs. Once it is decided to use multiple FPGAs, whether for a single large design, or for multiple instances of the same design talking together, the top-level challenges are well documented: Partitioning, I/O interconnect between FPGAs, and clocking.

Partitioning, or deciding which parts of your design to put in each FPGA, is straight forward in concept, but the devil is in the details. Simultaneously organizing the FPGA partitions to optimize FPGA utilization, minimize FPGA interconnect, and achieve the target performance is similar in some respects to the “Whac-A-Mole” game, you optimize one metric, and you knock one of the other metrics out of spec. Oh, and to make your partition challenge more interesting, there’s Rents Rule. This Rule says you can only put so much logic inside of so many pins, so figuring out how to “cut” your design across multiple FPGAs has limits beyond your control.

Then there’s the I/O interconnect between FPGAs. The difficulty of this task will be design dependent, but if your design is “highly interconnected”, you may not have enough physical pins to accommodate your logical pins between the FPGAs. But, don’t despair, pin-multiplexing techniques between FPGAs are available and well understood. Ah, but, pin-multiplexing comes with a performance penalty, remember the Whac-A-Mole analogy?

Lastly, system clocks and resets must be carefully managed for FPGAs, and there are physical implementation differences between SoCs and FPGAs. Keep in mind that the FPGA prototype is not the design, but it must “behave” like the design to be an effective design verification platform. Getting your FPGA clocks to behave like your SoC clocks without introducing design anomalies can be a challenge, and doing a good job with clocks will determine whether or not you hit your FPGA performance targets.

FPGA prototyping is not for the faint of heart, but it can save a design respin or two. In the early days of emulation, we used to say, “Emulation is hard, but its damn well worth the effort”. So, my recommendation is to do everything you can to prepare for the project. Like watching the SemiWiki webinar on this very subject. And, similar to those familiar warnings: “don’t try this at home”, get someone on your team who has done this before, absolutely.

The Q&A should be especially interesting for this one. If you want to include your questions to my list add them in the comments section.

About Aldec
Aldec Inc., headquartered in Henderson, Nevada, is an industry leader in Electronic Design Verification. Established in 1984, Aldec offers patented technology in the areas of mixed-language RTL simulation, FPGA acceleration and emulation, SoC/ASIC prototyping, design rule checking, clock domain crossing analysis, requirements traceability and functional verification for military, aerospace, avionics, automotive and industrial applications. www.aldec.com

March 7, 2020July 6, 2020

Technology Tyranny and the End of Radio

Technology Tyranny and the End of Radio
by Roger C. Lanctot on 03-07-2020 at 10:00 am
Categories: Automotive

As technology consumers we make tradeoffs.

We let Google peer into our online activity and email communications and we even accept annoying advertisements tied to our browsing activity in order to access free email and browing. We tolerate smartphones with diminishing performance from Apple – even after Apple admits that the diminishing performance is deliberately-inflicted obsolescence to push us into our next iPhone upgrade. We accept Tesla’s privacy violations in exchange for an awe-inspiring driving experience and software updates.

Along the way we have surrendered our privacy and so much more. Now Tesla Motors may be asking us to surrender free over-the-air broadcast radio.

According to the notes describing the latest software update for owners of 2018-made Tesla’s and earlier (using MCU-1) the latest optional software update (which carries a $2,500 price tag but adds Netflix, Hulu, Youtube, and Twitch) removes AM/FM radio and SiriusXM. This is the often-cited downside of software updates – the potential to obtain improved system performance while sacrificing previously desirable functionality.

While Tesla’s decision only impacts older Tesla’s, it nevertheless highlights the strangely tortured relationship between the broadcast radio industry and Silicon Valley. The issue is a common thread traceable to Apple’s refusal to activate the FM chips built into its phones – and Google’s decision to ignore “terrestrial” radio as part of either Android Auto or Google Automotive Services.

Google, Apple, and Tesla have all turned their backs on the broadcast radio industry in spite of the wide reach of radio – a reach that exceeds that of television – and the fact that it is free, localized content ideally suited to consumption in a mobile environment. Tesla’s decision likely only affects a sliver of Tesla owners given the cost of the optional upgrade and the limited in-vehicle enhancements, but it has the ominous tinge of something more sinister.

The Tesla softwae update, focused as it is on adding streaming video AND a $9.99/month subscription – for owners not already on the company’s premium service tier – points to a streaming-only approach to content delivery. Just as satellite broadcaster SiriusXM felt compelled to offer an IP version of its content, Tesla appears inclined to shift all content delivery to IP reception.

The strategy makes sense for a company delivering cars on multiple continents with varying local broadcast protocols and metadata. Shifting radio reception to IP delivery vastly simplifies the in-dash configuration and, in the long run, may enable some hardware content reduction in the form of deleted tuners and antennas. This is particularly relevant in the run up to 5G adoption – a technology upgrade that will require the additional of multiple antennas.

Tesla vehicles in North America have always come with TuneIn – so, now, TuneIn becomes the preferred radio IP broadcast point of aggregation. In fact, it is quite possible that Tesla has leveraged user data from its own vehicles to determine that radio listening in its vehicles was sufficiently minimal to be worth risking some minor resistance.

More importantly, the software update removing the radio experience is optional. Maybe the offer is a test to determine the customer reaction to a tradeoff of streaming video and improved user interface performance with the sacrifice of broadcast radio for $2,500? Is the offer a bit of a market research project? Anything is possible from Tesla, which has altered its pricing and discounts on multiple occasions in response to market conditions.

But the inclination to delete radio is a popular behavior pattern in Silicon Valley where Google and Apple have treated broadcasters with disdain. Is this approach sustainable? Is it tolerable? Where can an outraged consumer turn to protest? Will there be consumer outrage? Should there be? Is it time for an in-vehicle radio mandate to ensure that emergency communications – at least – can be broadcast into cars?

I’m not going to cry wolf. And I’m not going to play Chicken Little. I will say that the radio industry offers contextually relevant and reliable content delivery with a broad reach across a wide range of devices and listening environments. Deleting radio from cars – terrestrial or satellite-based – tears at the fabric of our social connectedness.

The marginal cost of preserving terrestrial broadcast connections – particularly in the context of radio’s ongoing global digital transition and the resilience of the medium during emergencies – ought to place this particular content reception experience in a non-delete category. Tesla doesn’t appear to share this view and Tesla is not alone. Once again, Silicon Valley is asking us to surrender one thing in exchange for another. Yesterday it was our privacy. Today it is the radio. Tomorrow it will be our freedom.

March 6, 2020March 4, 2023

A Forbidden Pitch Combination at Advanced Lithography Nodes

A Forbidden Pitch Combination at Advanced Lithography Nodes
by Fred Chen on 03-06-2020 at 10:00 am
Categories: Lithography
1 Comment

The current leading edge of advanced lithography nodes (e.g., “7nm” or “1Z nm”) features pitches (center-center distances between lines) in the range of 30-40 nm. Whether EUV (13.5 nm wavelength) or ArF (193 nm wavelength) lithography is used, one thing for certain is that the minimum imaged pitch will be less than the wavelength divided by the numerical aperture of the corresponding wavelength tool (0.33 for EUV, 1.35 for ArF immersion).

1X Pitch
Under these conditions, the targeted minimum pitch, which will be labelled as “1X” here, is imaged as the interference of two beams, no more, no less. Furthermore, only certain illumination angles will allow this interference to occur; other angles will not produce any image at all, and only appear as unwanted background light.

2X Pitch
Commonly, a processor layout can also include 2X pitches, i.e., twice 1X, lines which are separated by twice the minimum distance. These naturally occur when design grids are used, with the grid spacing correlating to the minimum pitch. However, when 2X pitches are imaged with the same illumination as 1X pitches, there is not only a difference of image but also a difference of depth of focus. The reason is 2X pitches are imaged as the interference of three beams rather than two.

Figure 1. Two-beam interference for 1X pitch and three-beam interference for 2X pitch.

1X vs. 2X Depth of Focus
The difference of optical path lengths between the middle beam and the side beam is large for the three-beam case, while for the two-beam case, there is no middle beam and the two side beams have similar optical paths at the appropriate angle. Consequently, the depth of focus (DOF) is worse for the three-beam case compared to the two-beam case. On the other hand, at a non-optimum angle, even the two-beam interference will defocus poorly, for the same reason of different optical path lengths between the two beams (Figure 2).

Figure 2. Different optical paths, indicated by the gap between the vertical positions of the different wavefronts in red and blue, for three-beam interference (left) and two-beam interference with non-optimum angle (right).

However, for the 2X pitch, it is still possible to find different illumination angles that result in close to the two-beam interference. These correspond to different source points in the pupil (Figure 3). The 2X pitch points have x angular coordinates which are half those of the optimum points for the 1X pitch, and at the same time, sufficiently high y angular coordinates to ensure that only two diffracted beams are captured within the numerical aperture.

Figure 3. Different source points in the pupil give rise to the desired two-beam interference patterns, for the 1X case (left: pitch = 0.88 wavelength/NA; right: pitch = 0.98 wavelength/NA) and the 2X case (left: pitch = 1.76 wavelength/NA; right: pitch = 1.96 wavelength/NA). Moreover, for the lower k1, the best DOF is a limited subset of all possible two-beam interference source points.

The different illumination conditions indicate the mutually exclusive defocus tolerance. A single exposure cannot offer the same focus windows to both 1X and 2X pitches.

Subresolution Assist Features (SRAFs)
A widely suggested proposed solution [1] to accommodate both 1X and 2X pitches is to use subresolution assist features (SRAFs) on the 2X pitches to make them appear more similar to 1X features (Figure 4). This essentially suppresses the middle beam of the three beams, resulting in the growing of side lobes in between lines. Care must be taken, however, not to have these side lobes print. The 2X pitch isn’t changed, and the middle beam cannot be completely eliminated, so the focus window improvement will still be narrower than the 1X case. SRAFs would also be more vulnerable to stochastic effects due to their smaller sizes [2]. In situations where multiple patterning is already planned to be used to add or remove lines, the use of SRAFs to match 2X to 1X pitches would be unnecessary.

Figure 4. Subresolution assist features (SRAFs) make the 2X pitch look more like a 1X pitch. However, side lobe printing at the SRAF locations is a risk that cannot be neglected.

EUV Pupil Rotation Sensitivity
It has been reported [3,4] that the distribution of EUV illuminating source points rotates azimuthally about the optical axis at different points across the exposure slit, to a range of more than +/- 18 degrees (Figure 5); it is a natural outcome of using reflective ring-field optics [5]. As a result, illumination angles with best depth of focus are rotated to angles with inferior depth of focus. To avoid this undesired outcome, a large portion of the exposure slit would have to be excluded, resulting in effectively a much smaller exposure field width. To compensate, the scan across the wafer must stop at many more locations, reducing throughput substantially (Figure 6). Even worse, some chips are already very wide, almost taking up the entire maximum 26 mm field width. For these chips, the wide exposure width would have to be divided for multiple exposures with separately optimized illuminations. One patent from TSMC [6] even rotates different parts of the wide chip layout accordingly, in order to keep the single exposure.

Figure 5. Pupil rotation across slit (~ +/-20 deg) does not preserve depth of focus at all slit locations, as the illumination source points are displaced from optimum locations.

Figure 6. Pupil rotation with a full field would suffer from illumination rotation (left); this can be mitigated with a smaller exposure field width (right), which requires more stopping time.

How It Is Dealt With

The 1X vs 2X pitch incompatibility for depth of focus can be handled in four different ways:

(1) Design rule restrictions: Exclude the 2X pitch as forbidden. This is by far the simplest approach. But it may be too restrictive.

(2) SRAFs: This has been implemented successfully for DUV lithography, with care taken to not print the SRAFs in the process. For EUV, though, stochastic effects are aggravated, and pupil rotation is not addressed.

(3) Multiple Patterning: Splitting out 1X and 2X pitches can occur as part of multipatterning.

(4) EUV pupil rotation: The EUV pupil rotation would either limit the exposure field or require pre-rotation of parts of the layout, to avoid multiple patterning otherwise.

References

[1] J. G. Garofalo et al., Proc. SPIE 2440, 302 (1995).

[2] https://www.linkedin.com/pulse/stochastic-printing-sub-resolution-assist-features-frederick-chen/

[3] R. Capelli et al., Proc. SPIE 9231, 923109 (2014).

[4] A. Garetto et al., J. Microlith/Nanolith. MEMS MOEMS 13, 043006 (2014).

[5] M. Antoni et al., Proc. SPIE 4146, 25 (2000).

[6] US Patent 9091930, assigned to TSMC.

This article first appeared in LinkedIn Pulse: A Forbidden Pitch Combination at Advanced Lithography Nodes

TSMC’s 5nm 0.021um2 SRAM Cell Using EUV and High Mobility Channel with Write Assist at ISSCC2020

TSMC’s 5nm 0.021um2 SRAM Cell Using EUV and High Mobility Channel with Write Assist at ISSCC2020
by Don Draper on 03-06-2020 at 6:00 am
Categories: Events, Foundries, TSMC
1 Comment

Fig. 1 Semiconductor Technology Application Evolution

Technological leadership has long been key to TSMC’s success and they are following up their leadership development of 5nm with the world’s smallest SRAM cell at 0.021um 2 with circuit design details of their write assist techniques necessary to achieve the full potential of this revolutionary technology. In addition to their groundbreaking device developments such as High Mobility Channel (HMC) they are the leading implementers of Extreme Ultra-Violet (EUV) patterning to enable higher yield and shorter cycle time at this advanced node.

Semiconductor technology evolution has been driven by the application landscape which in the current phase of High-Performance Computing (HPC), Artificial Intelligence (AI) and 5G communication requires the highest performance with limited power dissipation as illustrated in Fig. 1.

Fig. 1 Semiconductor Technology Application Evolution

This technology was described by TSMC at IEDM 2019, where they described their 5 nm process which uses more than 10 Extreme Ultra-Violet (EUV) mask patterning steps replacing three or more immersion mask steps each and High Mobility Channel (HMC) technology for higher performance. This technology has been in risk production since April of 2019 and will be in full production 1H2020.

The implementation of this technology for the development of high- performance SRAM bit cells and arrays was described by Jonathan Chang, et al at ISSCC2020.

The quantizing of FinFET transistor sizing continues to be a major challenge and forces all transistors in the high-density 6T SRAM cell to use only a single fin. The design is optimized through Design-Technology Co- Optimization (DTCO) to give high performance and density as well as high yield and reliability. SRAM bit cell scaling for 2011 to 2019 is shown in Fig. 2.

Fig. 2. SRAM bit cell scaling is shown for 2011 to 2019.

It can be noted that the cell size reduction rate from 2017 to 2018 to 2019 is much slower than the rate for preceding years 2011 to 2017, showing that SRAM cells have not been scaling at the same rate as logic in general. At IEDM 2019, the 5nm process was quoted to have 1.84x logic density improvement compared to 1.35x SRAM density improvement. Further area reduction utilizing Flying Bit Line (FBL) architecture is implemented for 5% area savings. The layout of the 5nm cell is shown in Fig. 3.

Fig. 3. Layout of the high-density 6T SRAM bit cell.

For power reduction, a key approach is lowering the minimum operating voltage Vmin of the SRAM array. The increased random threshold voltage variation in this latest technology limits Vmin which in turn limits the opportunities for power reduction. The SRAM voltage scaling trend is shown in Fig. 4, where the blue line indicates the Vmin without write assist and the red line indicates Vmin with write assist, showing great benefit of write assist with each generation. It will be observed that the Vmin from 7nm to 5nm shows very little improvement, indicating that further power reduction must be gotten from improvements in write assist generation circuits. This article will describe the major write assist methods to enable lower Vmin in operation, negative bit line (NBL) and Lower Cell VDD (LCV).

Fig. 4. SRAM cell voltage scaling trend without write assist (blue line) and
with write assist (red line).

The SRAM cell schematic is shown in Fig. 5 showing contention during write between the PU and pass-gate transistor PG. A stronger PU transistor would yield a higher read stability, but it degrades the write margin significantly and results in a contention write Vmin issue.

Fig. 5. SRAM cell schematic showing contention during write between the
PU and pass-gate transistor PG.

The first method to improve the write Vmin is to lower the bit line voltage during write, called Negative Bit Line or (NBL). This method has been employed for several years, using a MOS capacitor to generate a negative bias signal on the bit line, but this write assist circuitry results in area overhead. Furthermore, a fixed amount of MOS capacitance induces over boosted NBL level for short BL configuration and may led to dynamic power overhead in short bit lines, as shown in Fig. 6.

Fig. 6. Fixed amount of MOS capacitance induces over-boosted NBL level
for short BL configuration and may lead to dynamic power overhead
avoided by the metal cap NBL.

The overboost and the MOS capacitor area issues can be avoided by using a metal capacitor-coupled scheme based on coupled metal tracks laid out on top of the upper metal of the SRAM array. To avoid the overboost, the metal capacitor length can be modulated with the SRAM array bit line length, saving dynamic power. Furthermore, the coupled NBL level can also be adjusted to compensate the loss of write ability induced by BL IR drop on the far-side bit cell.

The NBL enable signal (NBLEN) in Fig. 7 drives one side of the metal capacitor C1 negative which couples a negative bias signal at the virtual

ground node NVSS which then passes through the write driver WD and column multiplex to the selected bit line.

Fig. 7. The NBL enable signal (NBLEN) couples the configurable metal
capacitor C1 to NVSS.

The NBL coupling level with different bit line configurations, Fig. 8, showing that the configurable metal capacitor C1 can track with bit line length so that the variation of the coupling NBL level with different Bit line length can be mitigated.

Fig. 8. NBL coupling level with different bit line configurations showing the
longer 256bit bitline (blue) having an extended NBL boosted level.

The second method of write assist is to Lower the Cell VDD, (LCV). The conventional techniques of LCV require a strong bias or an active-divider to adjust the column-wise bit cell power supply during write operation, but these techniques consume a huge amount of active power across operating time. Pulse Pull-down (PP) and Charge Sharing (CS) techniques are two alternative solutions but precise timing is difficult for PP, so CS is proposed using metal wire charge sharing capacitors on top of the array as shown in Fig. 9.

Fig. 9. Charge Sharing (CS) for Low Cell VDD (LCV) for write assist using
CS metal tracks on top of the SRAM array.

In write operation, the LCV enable signal (LCVEN) goes high, it turns off the pull low NMOS (N1) to isolate the charge sharing capacitor C1 from ground. A column is selected by COL[n:0] to turn the header P0 off and isolates the array virtual power rail CVDD[0] from true power VDDAI. Because the metal wire capacitance scales along with the size of the bit-cell array, it also benefits the SRAM compiler design and provides a relatively constant charge sharing voltage level with varied BL configurations. The charge sharing level is determined by metal capacitance ratio of CVDD and the charge sharing metal track. Fig. 10 shows three LCV-VDD ratios are implemented for 6%, 12% and 24%.

Fig. 10. Three LCV-VDD ratios are implemented for 6%, 12% and 24%.
With write assist turned off, Vmin is constrained by write failure. Measured
results with Write Assist in Fig. 11 show NBL improves Vmin by 300mV and 24% LCV improves Vmin independently by over 300mV.

Fig. 11. Measured results of (a) metal capacitor-boosted Write Assist
WAS-NBL scheme and (b) metal charge-sharing capacitor WAS-LCV
scheme.

Performance of the 5nm process is enhanced by the High Mobility Channel with ~18% drive current gain shown in Fig. 12. This technology was described in detail at IEDM2019.

Fig. 12. High Mobility Channel (HMC) performance gain of ~18%.
This performance gain is exemplified by the high-speed SRAM array for
L1 cache application achieving 4.1Ghz cycle time t 0.85V shown in the
shmoo plot in Fig. 13.

Fig. 13. Shmoo plot of the HD SRAM array for use as a high performance
L1 cache showing 4.1 GHz at 0.85V. The measured results are based on the 135 Mb test chip shown in Fig. 14.

Fig. 14. 135 Mb test chip in 5 nm HK-MK FinFET with High Mobility
Channel (HMC) and 0.021um 2 SRAM bit cell.

In summary, the detailed circuit design techniques described here enable the product developer to get the maximum advantage from this leading technology. An important device development approach is to do Design- Technology Co-optimization (DTCO) between product/circuit designers and process developers responsible for product yield and reliability.

ALSO READ: TSMC Unveils Details of 5nm CMOS Production Technology Platform Featuring EUV and High Mobility Channel FinFETs at IEDM2019

March 5, 2020July 6, 2020

The Story of Ultra-WideBand – Part 2: The Second Fall

The Story of Ultra-WideBand – Part 2: The Second Fall
by Frederic Nabki & Dominic Deslandes on 03-05-2020 at 10:00 am
Categories: 5G
1 Comment

Over-engineered to perfection, outmaneuvered by Wi-Fi
In Part 1 of this series, we recounted the birth of wideband radio at the turn of the 20^th century, and how superheterodyne radio killed wideband radios for messaging after 1920. But RADAR kept wideband research alive through World War 2 and the Cold War. Indeed, the story of wideband radios was not over…

Continuing the story, the benefits of ultra-wideband (UWB) became more apparent as demand for wireless communications grew in the 1990’s. But commercial deployment of UWB systems required worldwide agreement on frequency allocations, harmonic and power restrictions, etc. As interest in the commercialization of UWB increased, developers of UWB systems began pressuring the FCC to approve it for commercial use. In 2002 the Federal Communication Commission (FCC) finally allowed the unlicensed use of UWB systems. The European Telecommunications Standards Institute (ETSI) followed a few years later with their own regulations, unfortunately slightly different than the FCC regulation. Other regions followed, often aligning with FCC or ETSI.

UWB systems use short-duration (i.e. picosecond to nanosecond) electromagnetic pulses for transmission and reception of information. They also have a very low duty cycle, which is defined as the ratio of the time that an impulse is present to the total transmission time. Based on emission regulations set in the 2000s, an UWB signal is defined as a signal having a spectrum larger than 500 MHz. Most countries have now agreed on the maximum output power for UWB, defined as -41.3 dBm/MHz.

With regulations now in place, an alliance of companies started to form in order to standardize the physical layers and media access control (MAC) layers. In 2002 the WiMedia Alliance was formed which was a non-profit industry trade group that promoted the adoption, regulation, standardization and multi-vendor interoperability of UWB technologies. It was followed, in 2004, by the Wireless USB Promoter Group and the UWB forum.

In order to understand the choices made by these alliances, we should contextualize them. In 2002, WiFi was a relatively new technology. An 802.11b router, available since 1999, had a theoretical maximum speed of 11 Mbit/s using the 2.4 GHz frequency band. The 802.11a standard, also defined in 1999 and promising a theoretical maximum speed of 54 Mbit/s in the 5 GHz band, was not getting traction in the consumer space mainly due to its higher chipset cost. In 2003, the 802.11g standard was introduced, providing a theoretical maximum speed of 54 Mbit/s in the 2.4 GHz band. Even though the 802.11g standard proved to be a great success, the data rate was still limited by the crowded 2.4 GHz band, which was the backbone of wireless LANs at the time, and also microwave ovens and well-marketed (a.k.a. more trouble than they were worth) cordless phones!

It is with these limitations in mind that a new generation of UWB radios were proposed. With emission regulations now in place, it was hard to resist the promise of UWB-enabled high data rates. Indeed the 7.5 GHz of bandwidth allocated between 3.1 and 10.6 GHz by the FCC was an extremely valuable resource for wireless communication engineers. This is how specifications for short-range (i.e., few meters) file transfers at data rates of 480 Mbit/s were proposed based on UWB multi-band orthogonal frequency-division multiplexing (OFDM). After a few years of development, the first retail product started shipping in mid-2007. This was very much an overengineered wireless radio that multiplexed in a relatively classical way multiple wide bandwidth carriers, and was not per se an impulse-based radio akin to the spark-gap radio.

Even though OFDM UWB was making a lot of noise at the time and the products were promising, its introduction to the market faced a perfect storm in the late 2000s. 2008 marked the beginning of the great recession, leading to a significant decline in retail sales of consumer electronics. In addition, while the different UWB alliances were working on novel products, the WiFi Alliance was not standing still. In 2006, after years of development and negotiations, they published the first draft of the 802.11n standard. Supporting the Multiple-Input and Multiple-Output (MIMO) concept to multiplex channels, it was developed to provide data rates of up to 600 Mb/s. Although the final version of the standard was not published before October 2009, routers supporting the draft standard started pre-emptively shipping in 2007.

The last nail in the coffin of OFDM UWB came from the technology itself. The complexity of the OFDM UWB transceiver RF architecture proposed at the time and its stringent timing requirements lead to a relatively high product cost and a lackluster power-consumption.

This combination of events and technologically over-engineered chipset signed the demise of the high-speed UWB radios. The leader in UWB chipsets at the time, WiQuest with 85% of the market in early 2008, ceased operations on October 31, 2008. The UWB forum was disbanded after failing to agree to a standard due to contrasting approaches with the WiMedia Alliance. The WiMedia Alliance ceased its operations in 2009 after transferring all their specifications and technologies to the Wireless USB Promoter Group and the Bluetooth Special Interest Group. The Bluetooth Special Interest Group, however, dropped development of UWB as part of Bluetooth 3.0 in the same year.

Unfortunately, almost exactly a century after the retirement of the first UWB systems based on spark-gap radios, this new iteration of UWB radios based on the OFDM radio architecture was falling out of favor. However, against all odds, the world would not have to wait another century before seeing a new and improved implementation of and UWB radio. Indeed, the spark-gap radio would become even more of an inspiration for this UWB resurgence, and this resilient nature of UWB will be discussed in the third part of this series.

About Frederic Nabki
Dr. Frederic Nabki is cofounder and CTO of SPARK Microsystems, a wireless start-up bringing a new ultra low-power and low-latency UWB wireless connectivity technology to the market. He directs the technological innovations that SPARK Microsystems is introducing to market. He has 17 years of experience in research and development of RFICs and MEMS. He obtained his Ph.D. in Electrical Engineering from McGill University in 2010. Dr. Nabki has contributed to setting the direction of the technological roadmap for start-up companies, coordinated the development of advanced technologies and participated in product development efforts. His technical expertise includes analog, RF, and mixed-signal integrated circuits and MEMS sensors and actuators. He is a professor of electrical engineering at the École de Technologie Supérieure in Montreal, Canada. He has published several scientific publications, and he holds multiple patents on novel devices and technologies touching on microsystems and integrated circuits.

About Dominic Deslandes
Dr. Dominic Deslandes is cofounder and CSO of SPARK Microsystems, a wireless start-up bringing a new ultra low-power and low-latency UWB wireless connectivity technology to the market. He leads SPARK Microsystems’s long-term technology vision. Dominic has 20 years of experience in the design of RF systems. In the course of his career, he managed several research and development projects in the field of antenna design, RF system integration and interconnections, sensor networks and UWB communication systems. He has collaborated with several companies to develop innovative solutions for microwave sub-systems. Dr. Deslandes holds a doctorate in electrical engineering and a Master of Science in electrical engineering for Ecole Polytechnique of Montreal, where his research focused on high frequency system integration. He is a professor of electrical engineering at the École de Technologie Supérieure in Montreal, Canada.

March 5, 2020July 6, 2020

COVID-19 Collateral Chip Collision – Will Fabs & Foundries Flounder?

COVID-19 Collateral Chip Collision – Will Fabs & Foundries Flounder?
by Robert Maire on 03-05-2020 at 6:00 am
Categories: Semiconductor Advisors, Semiconductor Services
1 Comment

Corona Fab Impact –
lower production/raise prices
Chip production supply chain may break
It could temporarily fix memory oversupply
Could it risk the fall roll out of next Iphone

The ” Two week tango” – Waiting games at fabs
When a highly specialized piece of semiconductor equipment misbehaves to the point where fab workers can’t fix it, they pick up the phone and call their friendly tool maker field service techs who show up relatively quickly and fix the issue with on site or nearby spare parts and get the tool back up very quickly.

That is until Corona……

We have heard that at least TSMC and Intel fabs have instituted a two week quarantine period for outside service people.

Which means a tech shows up to fix or install a tool and has to cool his or her heels in a local hotel for two weeks until they prove they are not infected.

What happens to the tool for those two weeks? It stays down or not installed….

When you have dozens of Dep or Etch tools, losing a few may slow things down but lose a litho tool, especially a highly complex, prone to problems, in need of lots of preventative maintenance EUV tool and it will ruin a fab managers day and month for sure.

As you can imagine in a fab with literally hundreds of tools, this could and will easily “snowball” into a major league problem.

Yields will fall, throughput will suffer, lots of wafers will wind up in the trash.

If we take the cost of a $7B fab and try to calculate the hourly operating cost….its a lot….The lost productivity will be a big number.

Installs will suffer as well
Aside from impacting ongoing production, the Corona related delays and problems also obviously impact new tool installs, perhaps even more so than ongoing operations.

If the tool maker has to send a team to help install a new tool and the team has to cool its heels at TSMC for two weeks , it means that those same experts can’t be at another fab on time to install another tool for the next customer and installs go to hell in a hurry.

Basically the amount of time spent waiting around in hotels and not installing or fixing tools will be a very significant productivity loss.

Its also not like you can hire a warm body off the street and have them install and EUV scanner the next day. You can’t ramp up the number of service people over night.

If you think this is a good reason for remote diagnostics….think again. There is no such thing as a semiconductor tool hooked up to the internet at TSMC. No data connection whatsoever is allowed to the outside world lest it get hacked and secret recipes stolen or machines hijacked. The tools are in isolation with only highly supervised, hands on visits allowed

Could it push out Apples Iphone launch?
TSMC and Apple have developed a very predictable working rhythm to get Iphones launched every fall like clockwork.

TSMC orders new semiconductor tools for the next generation of Apple processors roughly Q4 of the prior year, they get installed in Q1, initial production starts in Q2 with full production in Q3 for the September or October launch in time for holiday sales.

We are currently in whats is likely peak new tool installation time at TSMC for the next gen semiconductor process.

If enough new tools installations get delayed, it could very easily push the whole schedule back.

Slow down the install of a few EUV tools today and suddenly you don’t have enough tools to do enough layers in high enough volume to make enough chips to go into enough Iphones for the fall launch dates.

The domino effect could be quite large….

AMD and Intel not immune either
You might think that Intel and AMD are not as impacted but AMD gets its parts from TSMC and Intel has the same Corona protocol as TSMC. Intel has already been short of production and has also farmed out to TSMC.

Supply chain has lots of single points of failure
It usually never comes to light, but the semiconductor industry has a lot of single points of failure in its highly complex supply chain.

It was pointed out in painful detail how the trade spat between Korea and Japan got ugly quickly as Japan had a monopoly as one of those single points of failure in the supply chain of photoresist and certain chemicals.

If those sole suppliers are in the wrong place and the wrong time due to Corona it will ripple through the industry.

Though there are many chemicals and materials one such chemical is TMAH (TETRAMETHYLAMMONIUM HYDROXIDE ) used in silicon etch and other applications. A nasty substance supplied mainly by China….home to Corona.

Maybe Corona will hit memory fabs
Maybe the semiconductor industry will get lucky and the oversupply of memory chips will get fixed by the Corona slow down hitting a few memory fabs and take them off line, putting supply and demand back in balance…… Idaho may luckily be the last place Corona will show up at. Memory prices may get a boost…..

The stocks
We should start to see some pre-announcements of missing numbers over the next few weeks as the electronics food chain grinds slower and slower. Though things will obviously recover, it make take a while for the supply chain to recover and in some cases the time will never be recovered .

The damage may be contained within the calendar year for some but maybe not all. Those further down the food chain, the users of chips, such as Apple will likely see the most impact as they have the widest exposure to many, many parts and suppliers. Fabs and other complex manufacturers clearly will have issues.

March 4, 2020January 12, 2021

Designing Next Generation Memory Interfaces: Modeling, Analysis, and Tips

Designing Next Generation Memory Interfaces: Modeling, Analysis, and Tips
by Mike Gianfagna on 03-04-2020 at 10:00 am
Categories: Cadence, EDA, Events
2 Comments

At DesignCon 2020, there was a presentation by Micron, Socionext and Cadence that discussed design challenges and strategies for using the new low-power DDR specification (LPDDR5). As is the case with many presentations at DesignCon, ecosystem collaboration was emphasized. Justin Butterfield (senior engineer at Micron) discussed the memory aspects and Daniel Lambalot (director of engineering at Socionext) discussed the system aspects. I was able to spend some time with one of the other authors, Zhen Mu (senior principal product engineer at Cadence) as well. Zhen provided background on the tool platform used in this program, which is completely supplied by Cadence.

The LPDDR5 spec was finalized and published last year and is the cutting edge of DDR memory interfaces. Increased speed and lower power don’t come for free. There are many challenges associated with using LPDDR5, including channel bandwidth, reduced voltage margin, the need to route multiple parallel channels, dealing with crosstalk and ensuring proper return currents, multi-drop configurations (2 DRAM loads) and limited equalization capability.

The key features of LPDDR5 can be summarized as follows:

Higher data rates (up to 6.4Gbps)
- Data transfer boosted about 1.5 times of the previous LPDDR4 interface
Power-isolated LVSTL interface with:
- VDD2H=1.05V for the DRAM core
- VDDQ=0.5V for the I/O
New packaging
Non-targeted on-die termination (ODT)
New eye mask specifications
- Change from rectangular mask in LPDDR4 to hexagonal mask in LPDDR5
- Two timing measurements – tDIVW1/tDIVW2 and vDIVW
- See diagram, below
Data bit inversion (DBI)
Separate Read strobe (RDQS) and Write strobe (WCK)
Advanced equalization technologies such as feed-forward equalization (FFE), continuous time linear equalization (CTLE), and decision feedback equalization (DFE) for the controller and the memory

Timing is one of the most challenging aspects of LPDDR5; controller jitter must be considered. Accurate modeling is key for success. The presentation discussed the details of modeling and analysis approaches to optimize the use of LPDDR5 in actual designs. Items to be considered include device modeling, system-level design with typical topology, channel simulation for parallel bus analysis, bus characterization, modeling filtering functions implemented in LPDDR5 DRAMs and crosstalk simulation.

IBIS (I/O Buffer Information Specification) and the Algorithmic Modeling Interface (AMI) extensions are standards typically used in SerDes design and analysis. IBIS-AMI modeling can also be applied to parallel bus analysis for LPDDR5 designs. The benefits of this modeling approach include interoperability (different models work together), portability (models run in multiple simulators), accuracy (results correlate to measurements), IP protection (circuit details are not exposed) and performance (million-bit simulations are practical).

There are challenges to apply SerDes modeling to a DDR interface, including the non-symmetrical nature of LPDDR5 timing, see diagram below.

From a big picture point of view, channel simulations and circuit transient analysis are correlated, including IBIS-AMI models, using the Cadence Sigrity Explorer tool. To complete the analysis, memory models were supplied by Micron and controller models were supplied by the Cadence IP team. Socionext and Micron provided package models for controller and memory, respectively. See diagram below for some results.

Green: Circuit simulation; Blue: Channel simulation

For crosstalk simulation, two approaches were used -characterize each bus signal individually as is done in SerDes channel simulations and characterize the entire bus with practical stimulus patterns. Using the following conditions, the effect of channel length on performance can be modeled.

The analysis suggests a trace length of one inch is desirable. This presentation highlighted the modeling and simulation techniques needed to help achieve a fully functional system with LPDDR5 memories. Accurate modeling with good ecosystem participation are required for success.

March 4, 2020April 26, 2022

LithoVision – Economics in the 3D Era

LithoVision – Economics in the 3D Era
by Scotten Jones on 03-04-2020 at 6:00 am
Categories: Lithography, Semiconductor Services, TechInsights
2 Comments

Each year on the Sunday before the SPIE Advanced Lithography Conference, Nikon holds their LithoVision event. This year I had the privilege of being invited to speak for the third consecutive year, unfortunately, the event had to be canceled due to concerns over the COVID-19 virus but by the time the event was canceled I had already finished my presentation so I thought I would share it with the SemiWiki community.

Outline
The title of my talk is “Economics in the 3D Era”. In the talk I will discuss the three main industry segments, 3D NAND, Logic and DRAM. For each segment I will discuss the current status and then get into roadmaps with technology, mask counts, density and cost projections. All the status and projections will be company specific and cover the leaders in each segment. All the data for this presentation, technology, density, mask counts, and cost projections come from our IC Knowledge – Strategic Cost and Price Model – 2020 – revision 00 model. The model is basically a detailed industry roadmap that provides simulations of cost, equipment and materials requirements.

You can read about the model here.

3D NAND
3D NAND is the most “3D” segment of the industry with a layer stacking technology that provides density improvement by adding layers in the third dimension.

Figure 1 illustrates the 3D NAND TCAT Process.

Figure 1. 3D NAND TCAT Process.

In the 3D NAND segment, the market leader is Samsung and they use the TCAT process illustrated in this slide. Number two in the market is Kioxia (formerly Toshiba Memory) and they use an essentially identical process. Micron Technology is also adopting a charge trap process we expect to be similar to this process making this process representative of the majority of the industry. SK Hynix uses a different process that still shares many key elements with this process. The only companies not using a charge-trap process are Intel-Micron and now that Intel and Micron have split apart on 3D NAND, Intel will be the only company pursuing floating gate.

The basic process has three major sections:

Fabricate the CMOS – the CMOS writes, reads and erases the bits. Initially everyone except Intel-Micron fabricated the CMOS next to the memory array with Intel-Micron fabricating some of the CMOS under the memory array. Over time other companies have migrated to CMOS under the array and we expect within a few generations that all companies will migrate to CMOS under the array because it offers better die area utilization.
Fabricate the memory array – for charge trap the array fabrication takes place by depositing alternating layers of oxide and nitride. A channel hole is then etched down through the layers and refilled with an oxide-nitride-oxide (ONO) layer, a poly silicon tube (channel) and filled with oxide. A stair step is then fabricated using a mask – etch – mask shrink – etch approach. A slot is then etched down through the array and the nitride film is etched out. Blocking films and tungsten are then deposited to fill the horizontal openings where the nitride was etched out. Finally, vias are etched down to the horizontal sheets of tungsten.
Interconnect – the CMOS and memory array are then interconnected. For CMOS under the array, some interconnect takes place before the memory array fabrication.

This approach is very mask efficient because many layers can be patterned with a few masks. The overall process requires a channel mask, several stair steps masks depending on the number of layers and process generation, in early generation processes, a single mask could produce approximately 8-layers but some process today can reach 32-layers with a single mask. The slot requires a mask, sometimes there is also a second shallow slot that requires a mask and finally via requires a mask.

The channel hole etching is a very difficult high-aspect-ratio (HAR) etch and once a certain maximum number of layers is reached the process must be broken up into “strings” in something called “string stacking”. In string stacking, basically, a set of layers is deposited, a mask is applied, and the channel is etched and filled. Then another set of layers is deposited and masked, etched and filled. In theory this can be done many times. Intel-Micron use a floating gate process that uses oxide-polysilicon layers that are much more difficult to etch than oxide-nitride layers, and they were the first to string stack. Figure 2 illustrates Intel-Micron string stacking.

Figure 2. Intel-Micron String Stacking.

Each company has their own approach to channel etching and their own limit in terms of when they string stack. Because they use oxide-poly layers Intel-Micron produced a 64-layer device by stacking 2 – 32-layer strings and then produced a 96-layer device by stacking 2 – 48-layer strings. Intel has announced a 144-layer device that we expect to be 3 – 48-layer strings. SK Hynix began string stacking at 72-layers and Kioxia at 96-layers (both charge trap processes with alternating oxide-nitride layers). Samsung is the last holdout on string stacking having produced a 92-layer device as a single string and they have announced a 128-layer – single string device.

Memory density can also be improved by storing multiple bits in a cell. NAND Flash has moved through a progression from 1 bit – single-level cell (SLC), to 2 bit – multi-level cell (MLC), to 3 bit – three-level cell (TLC), to 4 bit – quadruple-level cell (QLC). Companies are now preparing to introduce 5 bit – penta-level cells (PLC) and there is even discussion of 6 bit – hexa-level cells (HLC). Increasing the number of bits per cell helps with density but the benefit is decreasing, SLC to MLC is 2.00 the bits, MLC to TLC is 1.50x the bits, TLC to QLC is 1.33x the bits, QLC to PLC will be 1.25x the bits and if we get there PLC to HLC will be 1.20x the bits.

Figure 3 presents string stacking by year and company on the left axis and maximum bits per cell on the right axis.

Figure 3. Layers, Stacking and Bits per Cell.

Figure 4 presents our analysis of the resulting mask counts by exposure type, company and year. The dotted line is the average masks by year that is increasing from 42 in 2017 to 73 in 2025, this contrasts with the layers increasing from an average of 60 in 2017 to 512 in 2025. In other words, only a 1.7x increase in masks is required to produce an 8.5x increase in layers highlighting the mask efficiency of the 3D NAND processes.

Figure 4. Mask Count Trend.

Figure 5 presents actual and forecast bit density by company and year for both 2D NAND and 3D NAND. This is the bit density for the whole die or in other words the die bit capacity divided by the die size.

Figure 5. NAND Bit Density.

From 2000 to 2010, 2D NAND bit densities were increasing by 1.78x per year driven by lithographic shrinks. Around 2010 the difficulty of continuing to shrink 2D NAND led to a slow down to 1.43x per year until around 2015 when 3D NAND became the driver and continued at a 1.43x per year rate. We are projecting a slight slowdown from 2020 to 2025 to 1.38x per year. This is an improvement in our forecast from last year because we are seeing the companies push the technology faster than we originally expected. Finally, SK Hynix has talked about 500 layers in 2025 and 800 layers in 2030 resulting in a further slow down after 2025.

Figure 6 presents NAND Bit Cost Trends.

Figure 6. NAND Bit Cost Density.

In this figure we have taken wafer costs calculated using our Strategic Cost and Price Model and combined them with the bit density from figure 5 to produce a bit cost trend. In all cases the fabs are new greenfield 75,000 wafer per month fabs because that is the average capacity of NAND fabs in 2020. The countries where the fabs are located are Singapore for Intel-Micron, China for Intel, Japan for Kioxia and South Korea for Samsung and SK Hynix. These calculations do not include packaging and test costs, do not take into street width and have only rough die yield assumptions in them.

The first three nodes on the chart are 2D NAND where we see a 0.7x per node cost trend. With the transition to 3D NAND the bit cost initially increased for most companies but has now come down below 2D NAND bits costs and is following a 0.7x per node trend until around 300 to 400 layers. At 300-400 layers we project the cost per but will level out possibly placing an economic limit on this technology unless there are some breakthroughs in process or equipment efficiency.

Logic
For 3D NAND “nodes” are easy to define based on physical layers, for DRAM nodes are the active half-pitch, for logic nodes are pretty much whatever the marketing guys at a company wants to call them.

Some people consider the current leading edge FinFET processes to be 3D because the FinFET is a 3D structure but in the context of this discussion we consider 3D to be when device stacking allows multiple active layers to be stacked up to create stacks of devices. In this context 3D Logic will really come in once CFETs are adopted.

Figure 7 presents the nodes by year for the 3 companies pursuing the state of the art.

Figure 7. Logic Roadmap.

The node comparisons in this chart are complicated by the split between Intel and the foundries. Intel has followed the classic node names, 45nm, 32nm, 22nm, 14nm whereas the foundries have followed the “new” node names of 40nm, 28nm, 20nm, 14nm. Furthermore, intel has shrunk more per node and so Intel 14nm has similar density to foundry 10nm and Intel 10nm has similar density to foundry 7nm.

At the top of the figure I have outlined a consistent node name series based on alternating 0.71 and 0.70 shrinks.

In the bottom of the figure I have nodes by company and year with transistor density for each node. The transistor density is calculated based on a weighting of NAND and Flipflop cells as I have previously discussed. Next to each node in parenthesis is either FF for FinFET, HNS for horizontal nanosheet, HNS/FS for horizontal nanosheets with a dielectric wall (Forksheet) to improve density based on work Imec has done and CFET for complimentary stacked FETs where nFETs and pFETs are vertically stacked. CFETs will be when logic crosses over into a layer-based scaling approach and becomes a true 3D solution, in principle CFETs can continue to scale by adding more layers.

Bold indicates leading density or technology. In 2014 Intel takes the density lead with their 14nm process. In 2016 TSMC takes the density lead with their 10nm process and maintains the lead in 2017 with their 7nm process. TSMC and Samsung have similar densities at 7nm but going to 5nm TSMC is producing a much larger shrink than Samsung and in 2019 TSMC maintains the process density lead with their 5nm technology. If Samsung delivers on their HNS technology in 2020 that we are calling a 3.5nm node, they may take the density lead and be the first company to manufacture HNS. In 2021 the TSMC node we are calling 3.5nm may return them to the density lead. If Intel can deliver on their two-year cadence with the kind of shrinks they typically target we believe they could take the density lead in 2023 with their 5nm process. In 2024 we may see a first CFET implementation from Samsung taking the density lead until 2025 when Intel may regain the lead with their first CFET process.

Figure 8 presents the logic mask counts for these processes. The introduction of EUV is mitigating mask layers, without EUV we would likely see over 100 masks on this chart. As we did with the NAND mask count figure, the dotted line is average mask counts. We have also grouped processes based on “similar” densities so for example Intel 14nm is combined with the foundry 10nm process and intel 10nm with the foundry 7nm processes.

Figure 8 Logic Mask Counts.

Figure 9 presents the logic density in transistors per millimeter squared based on the NAND/Flipflop weighting metric mentioned previously.

Figure 9. Logic Density Trend.

There are six types of processes plotted on this chart.

Planar transistors were the primary leading-edge logic process until around 2014 and produced a density improvement of 1.33x per year, FinFETs then took over at the leading edge and have provided a 1.29x per year density improvement. In parallel to FinFETs we have seen the introduction of FDSOI processes. FDSOI offers simpler processes with lower design costs and better analog, RF and power but cannot compete with FinFETs for density or raw performance. When HNS takes over from FinFETs we expect the rate of density improvement to further slow to 1.16x per year and eventually CFETs take over and increase density at 1.11x per year. We have also plotted SRAMs produced by vertical transistors based on work by Imec that may provide an efficient solution for cache memory chiplets.

Figure 10 illustrates the trend in logic transistor cost.

Figure 10. Logic Transistor Cost.

Figure 10 presents the cost per billion transistors by combining wafer cost estimates from our Strategic Cost and Price Model with the transistor densities in figure 9. All fabs are assumed to be new greenfield fabs with 35,000 wafers per month capacity because that is the average size of logic fabs in 2020. The assumed countries are Germany for GLOBALFOUNDRIES except for 14nm that is done in the United States, the United States is also assumed for Intel except for 10nm in Israel, South Korea is assumed for Samsung and Taiwan for TSMC.

This plot does not include mask set or design cost amortization so while manufacturing cost per transistor is coming down the number of designs that can afford to access these technologies is limited to high volume products.

This plot does not include any packaging, test or yield impact.

From 130nm down to the i32/f28 (intel 32nm/foundry 28nm) node costs were coming down by 0.6x per node, then at the i22/f20 and f16/f14 node the cost reductions slowed because the foundries decided not to scale for their first FinFET processes. This slow down led to many in the industry erroneously predicting the end of cost reduction. From the f16/f14 node down to the i5/f2.5 node we expect costs to decrease by 0.72x per node and then slow to 0.87x per node thereafter. The g1.25 and g0.9 nodes are generic CFET processes with 3 and 4 decks respectively.

Figure 11 examines the impact of mask set amortization on wafer cost.

Figure 11. Mask Set Amortization.

The wafer costs in figure 11 are based on a new greenfield fab in Taiwan running 40,000 wafers per month. The amortization is mask set only and does not include design costs.

The table presents the 2020 mask set cost for 250nm, 90nm, 28nm and 7nm mask sets. Please note that at introduction these mask sets were more expensive. The mask set cost is them amortized over a set number of wafers and the resulting normalized costs are shown in the figure. In the table the wafer cost ratio is the wafer cost with amortization for 100 wafers run on a mask set divided by the wafer cost with amortization for 100,000 wafers run on a mask set.

From the figure and table, we can see that mask set amortization has a small effect at 250nm (1.42x ratio) and a large effect at 7nm (18.05x ratio). Design cost amortization is even worse.

The bottom line is that design and mask set costs are so high at the leading edge that only high-volume products can absorb the resulting amortization.

DRAM
Leading edge DRAMs have capacitor structures that are high aspect ratio “3D” devices but similar to current logic devices, DRAM doesn’t have scaling by stacking of active elements.

Figure 12 presents DRAM nodes by company in the top table and on the bottom of the figure are some of the key structures.

Figure 12. DRAM Nodes.

As DRAM nodes proceeded below the 4x nm generation the buried saddle fin access transistor with buried word line came into use (see bottom left). The bottom right illustrates the progression of the capacitor structure to higher aspect ratio structures with two layers of silicon nitride “MESH” support. DRAM capacitor structures are reaching the mechanical stability limits of the technology and with dielectric k values also stalled, DRAM scaling is evolving into single nanometer per node scaling.

Figure 13 illustrates mask counts by exposure type and company.

Figure 13. DRAM Mask Counts.

From figure 13 it can be seen that from the 2x to 2y generation there is a big jump in mask counts. The jump is driven by performance and power requirements that led to the need for more transistor types and threshold voltages in the peripheral logic.

At the 1x node Samsung is the first company to introduce EUV to DRAM production and the number of EUV layers grows at the 1a, 1b and 1c nodes. SK Hynix is also expected to implement EUV, we do not currently expect Micron to implement EUV.

Figure 14 illustrates the trend in DRAM bit density by year.

Figure 14. DRAM Bit Density.

In figure 14 the bit density is the product capacity in gigabytes divided by the die size in millimeters square.

Form figure 14 is can be seen that there has been a slowdown in bit density growth beginning around 2105. DRAM bit density is currently constrained by the capacitor and it isn’t clear what the solution will be. Long term a new type of memory may be needed to replace DRAM. DRAM requires relatively fast access with high endurance and currently MRAM and FeRAM appear to be the only options that have the potential to meet the speed and endurance requirements. Because MRAM requires relatively high current to switch, large selector transistors are required constraining the ability to shrink MRAM to competitive density and cost. FeRAM is also a potential replacement and is getting a lot of attention at places like Imec.

Figure 15 illustrates the DRAM bit cost trend.

Figure 15. DRAM Bit Cost Trend.

Figure 15 is based on combining wafer cost estimates from our Strategic Cost and Price Model with the bit densities in figure 14. All fabs are assumed to be new greenfield fabs with 75,000 wafers per month capacity because that is the average size of DRAM fabs in 2020. The assumed countries are Japan for Micron and South Korea for Samsung and SK Hynix.

These calculations do not include packaging and test costs and do not take into street width or die yield.

In this plot the combination of higher mask counts and slower bit density growth lead to a slow down from 0.70x per node cost trend to a 0.87x per node cost trend.

Conclusion
NAND has successfully transitioned from 2D to 3D and now has a scaling path until around 2025. After 2025 scaling may be possible with very high layers counts but unless a breakthrough in the process or equipment efficiency is made, cost per bit reductions may end.

Leading edge logic today utilizes 3D FinFET structures but won’t be a true stacked device 3D technology until CFETs are introduced around 2025. Logic has the potential to continue to scale until the end of the 2020s by transitioning from FinFET to HNS to CFET although the cost improvements will likely slow down.

DRAM is the most constrained of the 3 market segments, scaling and cost reductions have already slowed down significantly and no solution is currently known. Slower bit density and cost scaling will likely continue until around 2025 when a new memory type may be needed.

Here is the full presentation:

Lithovision 2020

Also Read:

IEDM 2019 – Imec Interviews

IEDM 2019 – IBM and Leti

The Lost Opportunity for 450mm

March 3, 2020July 6, 2020

The Story of Ultra-WideBand – Part 1: The Genesis

The Story of Ultra-WideBand – Part 1: The Genesis
by Frederic Nabki & Dominic Deslandes on 03-03-2020 at 10:00 am
Categories: 5G
2 Comments

The Story of Ultra WideBand Part 1 SemiWiki

In the middle of the night of April 14, 1912, the R.M.S. Titanic sent a distress message. It had just hit an iceberg and was sinking. Even though broadcasting an emergency wireless signal is common today, this was cutting edge technology at the turn of the 20^th century. This was made possible by the invention of a broadband radio developed over the previous 20 years: the spark-gap transmitter.

Developed by Heinrich Hertz in the 1880s, the spark-gap radio was improved by Guglielmo Marconi who succeeded in sending the first radio transmission across the Atlantic Ocean in 1901. After the Titanic disaster, wireless telegraphy using spark-gap transmitters quickly became universal on large ships, with The Radio Act of 1912 requiring all seafaring vessels to maintain 24-hour radio watch. The spark-gap radio was then the most advanced technology enabling wireless communication between ships, used through the first world war.

The architecture of the spark-gap radio was significantly different than what is currently used in wireless transceivers, including our cellphones, WiFi networks and Bluetooth devices. Modern narrowband communications systems modulate continuous-waveform radiofrequency (RF) signals to transmit and receive information. But at the turn of the 20^th century, the spark-gap transmitter generated electromagnetic waves by means of an electric spark and no narrowband radiofrequency signal was being modulated. The spark was generated using a capacitance discharged through electric arcing across a gap between two conductors. These very short time discharges generated oscillating currents in the wires, which then excited an electromagnetic wave that radiated out and could be picked up electromagnetically at a great distance. From the well-known time-frequency duality principle, a short impulse in time, analogous to the electric spark, gives a wideband signal in frequency and this was the basis of communications for two decades.

An interesting point to note is that the spark-gap radio could not support a continuous transmission, such as a sound signal. A message had to be composed of a series of sparks, transmitting discrete pieces of information, making it the first digital radio. This characteristic was ideal to transmit Morse code. However, it was then believed that it was not possible with the spark-gap radio to transmit a continuous signal like voice or music, without loss of information. It was decades before Shannon and Nyquist showed how to do that with digital modulation techniques.

This gap in digital modulation knowledge, coupled with the difficulty to generate high power spark-gap transmissions were shortcomings that were fatal to the spark-gap radio. After World War 1, carrier-based transmitters were developed using vacuum tubes, producing continuous waves that could carry audio. Nowadays, virtually all wireless transceivers use the same architecture, based on the work of US engineer Edwin Armstrong in 1918. Called the superheterodyne radio, this architecture uses frequency mixing to convert a received narrowband signal to a relatively low intermediate frequency (IF) that is then processed in baseband circuitry. This innovation gave rise, starting around 1920, to the AM radio which was followed a decade later by the FM radio. By the late 1920s the only spark transmitters still in operation were legacy installations on naval ships. Wideband radio was effectively dead.

Wideband’s Rebirth after 100 years: A Detective Story
Why then did Apple release the iPhone 11 in 2019 with an ultra-wideband (UWB) transceiver, implemented in silicon on their new U1 wireless processor chip? The answer requires some detective work into clues stretching back to the middle of the last century.

The first clue was another impulse-based wideband radio technology developed in top secret laboratories around the world in the 1930’s and during World War 2: RADAR. The story of RADAR has been told many times; it provided a pivotal advantage in both the Battle of Britain and naval battles in the Pacific.

For the purposes of this discussion, RADAR is able to determine the range, angle and velocity of objects. After the war, impulse-based transceivers started once more to gain momentum, now in military applications. From the 1960s to the 1990s, this technology was restricted to military applications under classified programs, both as a location finding and a communication technology. By the mid-1980s, a wide range of research papers, books and patents from UWB pioneers like Harmuth at Catholic University of America and Ross and Robbins at Sperry Rand Corp became available. This great source of information revived the interest in UWB systems because of wideband’s unique ability to deliver location data.

Apple’s first use for UWB is to provide positioning data. Positioning enables many applications in augmented reality (AR), virtual reality (VR), gaming, device recovery, file sharing and advertising beacons. We will explore UWB positioning technology further in Part 3. But positioning by itself is not sufficient reason for Apple to build a custom silicon UWB implementation.

Future articles in this series will discuss five clues to understanding Apple’s adoption of UWB for the iPhone 11:

UWB can provide positioning data
Its very low power emissions ensure that UWB does not interfere with other communications
Low power output also makes UWB signals difficult to detect by unintended users
The low duty cycle enables ultra-low power and increases resistance to jamming or interference
The very short impulses enable the reduction of the communication latency.

The story continues in Part 2