webinar banner2025 (1)

Interview with Altair CTO Sam Mahalingam

Interview with Altair CTO Sam Mahalingam
by Daniel Nenni on 06-29-2020 at 10:00 am

altair cto sam mahalingam RGB

In this interview we talk with Sam Mahalingam, chief technology officer at Altair, about gaining a competitive edge with software that’s built to handle high-throughput workloads like chip design and electronic design automation (EDA). Altair is a global technology company providing solutions in product development, high-performance computing (HPC), and data analytics.

Where does Altair fit into the semiconductor industry?
A competitive edge in this industry can come down to a very slim margin: seconds or even milliseconds. The goal is to enable users to iterate more designs in less time, and ultimately reach market first with a superior product. Our software is designed to make that possible. When you’re not getting as much value as you should from your infrastructure, not optimizing where it’s possible, the long-term cost of missed opportunities can be high.

A big challenge is that every step of design exploration and verification involves a complex set of variables, each requiring time to analyze both individually and in terms of its interaction with other variables. It’s not unusual for engineering teams to run millions of jobs each day, so the ability to achieve maximum throughput means a team can test as many variables as possible and be less likely to miss a crucial interaction that will affect the final product.

Altair software designed expressly for high-throughput workloads in areas like semiconductor design includes the Altair Accelerator™ enterprise job scheduler, Altair Allocator™ multi-site license allocation and management tool, and Altair Accelerator Plus hyper-scheduler.

How does hyper-scheduling boost efficiency?
Hyper-schedulers, or hierarchical schedulers, like Altair Accelerator Plus are built to offload the base scheduler for greater throughput, better utilization, and flexible usage models. Millisecond dispatch latency is important for short jobs, which users can queue sequentially on their own. When a batch of jobs is presented as one larger job, it reduces the burden on the lower-level scheduler while maintaining visibility into each individual job.

A hierarchical scheduler can also handle user query, job submission, and reporting functions, which is a substantial offload from the base scheduler since 80% of typical scheduler loads come from these non-dispatch functions.

What’s your take on cloud technology for chip design and EDA?
Today’s cloud technology has broken down barriers like cost, latency, and security concerns. The cloud is scalable and elastic, and it’s an attractive option for chip designers and EDA engineers. Major cloud providers including Google Cloud, Amazon Web Services, Oracle Cloud Interface, and Microsoft Azure make it possible for businesses of any size to access powerful resources without the need to own and maintain their own data centers.

Scheduling technology like Altair Accelerator enables users to optimize performance, match cloud expenditure to actual compute demand, and let teams shift seamlessly between cloud and on-premises environments with flexible, demand-based license allocation tools. When demand stops, on-demand cloud resources scale right back to zero.

Accelerator is storage-aware, meaning that it modulates running jobs based on filer latency. Accounting for latency experienced by the filer can dramatically accelerate scheduling speed in the cloud. Altair customers have seen up to 10 times increased acceleration with storage-aware scheduling. We also have a useful tool we call Rapid Scaling that can be used for cost optimization.

Tell us more about Rapid Scaling.
Rapid Scaling is part of the Accelerator package, an optimization tool we designed to be minimalist, configurable, and transparent. It allows users to auto-scale resources in the cloud, helping to bring the cost of cloud resources as close as possible to exact demand. Rapid Scaling looks at workload speed and determines which computing resources are critical. Users can scale based on workload speed and contain everything in a single instance, clearly showing the cost of computing.

With Rapid Scaling users can measure workload movement and respond quickly, rapidly terminate instances, and know exactly how much they’re spending.

How does licensing impact workload optimization?
EDA licenses are expensive, and large companies routinely spend millions of dollars on them every year. With millions of jobs running daily, it’s easy for users to get bogged down in peak-time resource queues. Not having enough licenses means lost productivity and slower progress, but an excess of licenses is a waste of money. With the right scheduling software, engineers get access to just enough licenses to get their work done without needing to absorb the cost of overprovisioning for peak demand times.

We optimize license utilization between on-premises and cloud infrastructure with Altair Allocator, based on demand at each location. Licenses are allocated between on-premises and cloud locations. This makes for efficient collaboration, even in hybrid environments. Licenses are simply moved to where the workload is.

You can learn more in this webinar, Saving Serious Money with License-first Scheduling.

About Altair (Nasdaq: ALTR)
Altair is a global technology company that provides software and cloud solutions in the areas of product development, high performance computing (HPC) and data analytics. Altair enables organizations across broad industry segments to compete more effectively in a connected world while creating a more sustainable future. To learn more, please visit www.altair.com.

Also Read:

CEO Interview: John O’Donnel of yieldHUB

CEO Interview: Deepak Kumar Tala of SmartDV

Fractal CEO Update 2020


Optimizing Chiplet-to-Chiplet Communications

Optimizing Chiplet-to-Chiplet Communications
by Tom Dillinger on 06-29-2020 at 6:00 am

bump dimensions

Summary
The growing significance of ultra-short reach (USR) interfaces on 2.5D packaging technology has led to a variety of electrical definitions and circuit implementations.  TSMC recently presented the approach adopted by their IP development team, for a parallel-bus, clock-forwarded USR interface to optimize power/performance/area – i.e., “LIPINCON”.

Introduction
The recent advances in heterogeneous, multi-die 2.5D packaging technology have resulted in a new class of interfaces – i.e., ultra-short reach (USR) – whose electrical characteristics differ greatly from traditional printed circuit board traces.  Whereas the serial communications lane of SerDes IP is required for long, lossy connections, the short-reach interfaces support a parallel bus architecture.

The SerDes signal requires (50 ohm) termination to minimize reflections and reduce far-end crosstalk, adding to the power dissipation.  The electrically-short interfaces within the 2.5D package do not require termination.  Rather than “recovering” the clock embedded within the serial data stream, with the associated clock-data recovery (CDR) circuit area and power, these parallel interfaces can use a simpler “clock-forwarded” circuit design – a transmitted clock signal is provided with a group of N data signals.

Another advantage of this interface is that the circuit design requirements for electrostatic discharge protection (ESD) between die are much reduced.  Internal package connections will have lower ESD voltage stress constraints, saving considerable I/O circuit area (and significantly reducing I/O parasitics).

The unique interface design requirements between die in a 2.5D package has led to the use of the term “chiplet”, as the full-chip design overhead of SerDes links is not required.  Yet, to date, there have been quite varied circuit and physical implementation approaches used for these USR interfaces.

TSMC’s LIPINCON interface definition
At an invited talk for the recent VLSI 2020 Symposium, TSMC presented their proposal for a parallel-bus, clock-forwarded architecture – “LIPINCON” – which is short for “low-voltage, in-package interconnect”. [1]  This article briefly reviews the highlights of that presentation.

The key parameters of the short-reach interface design are:

  • Data rate per pin:  dependent upon trace length/insertion loss, power dissipation, required circuit timing margins
  • Bus width:  with modularity to define sub-channels
  • Energy efficiency:  measured in pJ/bit, including not only the I/O driver/receiver circuits, but any additional data pre-fetch/queuing and/or encoding/decoding logic
  • “Beachfront” (linear) and area efficiencies:  measure of the aggregate data bandwidth per linear edge and area perimeter on the chiplets – i.e., Tbps/mm and Tbps/mm**2;  dependent upon the signal bump pitch, and the number and pitch of the metal redistribution layers on the 2.5D substrate, which defines the number of bump rows for which signal traces can be routed – see the figures below
  • Latency:  another performance metric; the time between the initiation of data transmit and receive, measured in “unit intervals” of the transmit cycle

Architects are seeking to maximize the aggregate data bandwidth (bus width * data rate), while achieving very low dissipated energy per bit.  These key design measures apply whether the chiplet interface is between multiple processors (or SoCs), processor-to-memory, or processor-to-I/O controller functionality.

The physical signal implementation will differ, depending on the packaging technology.  The signal redistribution layers (RDL) for a 2.5D package with silicon interposer will leverage the finer metal pitch available (e.g., TSMC’s CoWoS).  For a multi-die package utilizing the reconstituted wafer substrate to embed the die, the RDL layers are much thicker, with a wider pitch (e.g., TSMC’s InFO).  The figures below illustrate the typical signal trace shielding (and lack of shielding) associated with CoWoS and InFO designs, and the corresponding signal insertion and far-end crosstalk loss.

 

The key characteristics of the TSMC LIPINCON IP definition are illustrated schematically in the figure below.

  • A low signal swing interface of 0.3V is adopted (also saves power).
  • The data receiver uses a simple differential circuit, with a reference input to set the switching threshold (e.g., 150mV).
  • A clock/strobe signal is forwarded with (a sub-channel of) data signals;  the receiver utilizes a simple delay-locked loop (DLL) to “lock” to this clock.

Briefly, a DLL is a unique circuit – it consists of an (even-numbered) chain of identical delay cells.  The figure below illustrates an example of the delay chain. [2]   The switching delay of each stage is dynamically adjusted by modulating the voltage inputs to the series nFET and pFET devices in the input inverter of each stage – i.e., a “current-starved” inverter.  (Other delay chain implementations dynamically modify the identical capacitive load at each stage output, rather than adjusting the internal transistor drive strength of each stage.)

The “loop” in the DLL is formed by a phase detector (XOR-type logic with low-pass filter), which compares the input clock to the final output of the chain.  The leading or lagging nature of the input clock relative to the chain output adjusts the inverter control voltages – thus, the overall delay of the chain is “locked” to the input clock.  The (equal) delays of each stage in the DLL chain provides outputs that correspond to a specific phase of the input clock signal.  The parallel data is captured in receiver flops using an appropriate phase output, a means of compensating for any data-to-clock skew across the interface.

The TSMC IP team developed an innovative approach for the specific case of a SoC-to-memory interface.  The memory chiplet may not necessarily embed a DLL to capture signal inputs.  For a very wide interface – e.g., 512 addresses, 256 data bits, divided into sub-channels – the overhead of the DLL circuitry in the cost-sensitive memory chiplet would be high.  As illustrated in the figure below, the DLL phase output which serves as the input strobe for a memory write cycle is present in the SoC instead.  (The memory read path is also illustrated in the figure, illustrating how the data strobe from the memory is connected to the read_DLL circuit input.)

For the parallel LIPINCON interface, simultaneous switch noise (SSN) related to signal crosstalk is a concern.  For the shielded (CoWoS) and unshielded (InFO) RDL signal connections illustrated above, TSMC presented results illustrating very manageable crosstalk for this low-swing signaling.

To be sure, designers would have the option of developing a logical interface between chiplets that used data encoding to minimize signal transition activity in successive cycles.  The simplest method would be to add data bus inversion (DBI) coding – the data in the next cycle could be compared to the current data, and transmitted using true or inverted values to minimize the switching activity.  An additional DBI signal between chiplets carries this decision for the receiver to decode the values.

The development of heterogeneous 2.5D packaging relies upon the integration of known good die/chiplets (KGD).  Nevertheless, the post-assembly yield of the final package can be enhanced by the addition of redundant lanes which can be selected after package test (ideally, built-in self-test).  The TSMC presentation included examples of redundant lane topologies which could be incorporated into the chiplet designs.  The figure below illustrates a couple of architectures for inserting redundant through-silicon-vias (TSVs) into the interconnections.  This would be a package yield versus circuit overhead tradeoff when architecting the interface between chiplets.

In a SerDes-based design, thorough circuit and PCB interconnect extraction plus simulation is used to analyze the signal losses.  The variations in signal jitter and magnitude are analyzed against the receiver sense amp voltage differential.  Hardware lab-based probing is also undertaken to ensure a suitable “eye opening” for data capture at the receiver.  TSMC highlighted that this type of interface validation is not feasible with the 2.5D package technology.  As illustrated below, a novel method was developed by their IP team to introduce variation into the LIPINCON transmit driver and receive capture circuitry to create an equivalent eye diagram for hardware validation.

The TSMC presentation mentioned that some of their customers have developed their own IP implementations for USR interface design.  One example showed a very low swing (0.2V) electrical definition that is “ground referenced” (e.g., signal swings above and below ground).  Yet, for fabless customers seeking to leverage advanced packaging, without the design resources to “roll their own” chiplet interface circuitry, the TSMC LIPINCON IP definition is an extremely attractive alternative.  And, frankly, given the momentum that TSMC is able to provide, this definition will likely help accelerate a “standard” electrical definition among developers seeking to capture IP and chiplet design market opportunities.

For more information on TSMC’s LIPINCON definition, please follow this link.

-chipguy

 

References

[1]  Hsieh, Kenny C.H., “Chiplet-to-Chiplet Communication Circuits for 2.5D/3D Integration Technologies”,  VLSI 2020 Symposium, Paper SC2.6 (invited short course).

[2]  Jovanovic, G., et al., “Delay Locked Loop with Linear Delay Element”, International Conference on Telecommunication, 2005, https://ieeexplore.ieee.org/document/1572136

Images supplied by the VLSI Symposium on Technology & Circuits 2020.

 


Intel Designs Chips to Protect from ROP Attacks

Intel Designs Chips to Protect from ROP Attacks
by Matthew Rosenquist on 06-28-2020 at 10:00 am

Intel Designs Chips to Protect from ROP Attacks

Intel comes late to the game but will be delivering an embedded defense for Return Oriented Programming (ROP) types of cyber hacks. I first blogged about this back in Sept of 2016. Yes, almost four years have passed and I had hoped it would see the light of day much earlier.

The feature, to debut in the Tiger Lake microarchitecture in 2021 according to Intel, will be marketed as a Control-Flow Enforcement Technology (CET) that is designed to disrupt a class of exploits that seek to leverage bits of code that are already trusted. These ROP attacks use chunks of code from other software and hobble them together to create a malicious outcome. In the hacking world, it is similar to Frankenstein’s monster, where something grotesque is assembled from various innocent parts. ROP hacking techniques are great at evading detection and therefore a favorite among the higher classes of skilled threat actors.

Embedding the CET feature into the hardware and firmware provides a few advantages over trying to mitigate these attacks solely at the operating system level. First, there is the performance factor. Code that is specifically optimized by hardware moves significantly faster than traditional software components, so this should have a much less impact on system performance. Secondly, depending upon how it is configured to run, the hardware can add additional protection features to reduce the chances it can be disabled, modified, or compromised by adversaries.

Unfortunately, that is not the whole picture, as there are potential drawbacks for embedding such designs lower in the system stack. Namely, if there is a vulnerability in the code, it could be very difficult to patch or correct. Let’s face it, Intel’s reputation is not the greatest as of late when it comes to dealing with vulnerabilities in their products.

Overall, I am excited at the prospect of disrupting ROP types of attacks. I fully expect the best and brightest hackers will work to find ways around the protections, but that takes time and resources. This is how the game is played. It is great when new technology takes the initiative to force the attackers to adapt. The value for CET greatly depends on OS vendors’ adoption, if it has the right balance of features that are hardened, and if it runs with such efficiency that it does not overly burden system performance. Expects tests and reviews after Tiger Lake comes to market, to determine if it is simply a superficial marketing tactic or if CET represents a robust capability to mitigate hacking risks.

Interested in more? Follow me on LinkedInMedium, and Twitter (@Matt_Rosenquist) to hear insights, rants, and what is going on in cybersecurity.


The Stochastic Impact of Defocus in EUV Lithography

The Stochastic Impact of Defocus in EUV Lithography
by Fred Chen on 06-28-2020 at 6:00 am

The Stochastic Impact of Defocus in EUV Lithography

The stochastic nature of imaging has received a great deal of attention in the area of EUV lithography. The density of EUV photons reaching the wafer is low enough [1] that the natural variation in the number of photons arriving at a given location can give rise to a relatively large standard deviation.

In recent studies [2,3], it was shown that large 2D complex patterns with a large diffraction spectrum can divide a large number of photons into smaller groups, each representing a different interference pattern. Each group therefore has a relatively more significant shot noise. However, the effect of defocus had not yet been considered.

In this article, it will be shown that even for a single photon group for a basic 2-beam interference pattern, when a large number of source points are used, the effect of defocus is to, once again, divide the total number of photons into smaller groups, each representing a different degree of defocus, as determined by the phase difference between the two interfering beams, referred to as the 0th and 1st diffraction orders. This, in turn, causes a more rapid degradation of the image.

Separation of Source Points by Defocus

Figure 1 shows all the possible source points that can contribute to imaging a 40 nm line pitch, under the condition of 60 nm defocus. The source point coordinates are the sines of the angles with respect to the optical axis. At the nominal EUV wavelength of 13.5 nm and numerical aperture of 0.33, the 40 nm pitch can only be imaged as a 2-beam interference. Moreover, some source points (not shown) cannot provide an interference pattern, only background light.

Figure 1. Source points for two-beam interference at 40 nm pitch, classified by the phase difference between the interfering beams at 60 nm defocus (wavelength = 13.5 nm). The phase difference from 60 nm defocus is calculated by 360 deg/13.5 nm * 60 nm * [cos(0th order angle) – cos(1st order angle)].

Figure 1 shows only those points producing 2-beam interference, categorized according to phase difference between the interfering 0th and 1st orders. From the image differences among the groups in Figure 2, we can roughly divide the photons by defocus into 0-30 deg, 30-50 deg, 50-70 deg, 70-90 deg, in both positive and negative directions, leading to eight groups total, with the photons roughly uniformly distributed among them.

Figure 2. Effect of defocus phase difference between 0th and 1st orders on the image. 

With significantly fewer photons per phase difference range, the stochastic impact is aggravated. The degree of defocus of the wafer image becomes effectively determined by the variable number of photons per phase difference range.

Phase defect sensitivity

EUV masks are also subject to phase defects, which can be manifest as sub-nm height bumps [4]. These phase defects change the vertical location of best focus and introduce small CD errors. The stochastic impact will manifest itself as defocus variation, i.e., how far the wafer location is from best focus, as well as CD variation (see Figure 3).

Figure 3. A phase defect combined with defocus lead to a more severe CD error. This is for the two-beam interference case as in Figure 1. A 30 degree defocus-induced phase difference between 0th and 1st orders is assumed. A 20 deg 10 nm wide phase line defect in a nominal 20 nm wide exposed line region (40 nm pitch) is also assumed.

Remaining Concerns for Using Low Pupil Fill

The impact of wide defocus range provides yet another argument for low pupil fill [5]. A lower pupil fill obviously reduces the defocus range, resulting in reducing the phase difference range. There is still the remaining concerns of throughput from light being excluded [6] and ring field illumination rotation [7].

The ring field illumination concern is reviewed in Figure 4. Ideally, the plane of incidence is fixed across a rectangular exposure field (slit). However, the off-axis focus is not to a line but a point. Consequently, the field is arc-shaped, and the plane of incidence is rotated across the field, with the line of sight to the point source as the axis of symmetry. This means the distribution of source points is also rotated across slit, not maintaining their ideal position.

Figure 4. Plane of incidence must rotate for focusing to an off-axis point point, in a reflective optical system.

While DUV wavelengths, e.g., ArF (193 nm) immersion, also quickly migrated to low pupil fill (due to very low k1) for better defocus performance, those optical systems were transmissive, not reflective, obviating the need for off-axis focusing and ring fields.

Therefore, a workaround that can be used with EUV systems for now would be using only a small portion of the field to limit the degree of rotation [8]. A smaller field, however, means more exposure stops per wafer, so throughput again will suffer.

References

[1] https://www.euvlitho.com/2009%20Workshop/Oral%2045%20Resist-8%20Mack.pdf

[2] https://www.linkedin.com/pulse/stochastic-considerations-multi-point-source-lithography-chen

[3] https://www.linkedin.com/pulse/stochastic-variation-euv-source-illumination-frederick-chen/

[4] T. Terasawa, T. Yamane, Y. Arisawa, H. Watanabe, “Phase defect printability analyses: dependence of defect type and EUV exposure condition,” Proc. SPIE 8322, 83221R (2012).

[5] https://www.linkedin.com/pulse/need-low-pupil-fill-euv-lithography-frederick-chen

[6] M. van de Kerkhof, H. Jasper, L. Levasier, R. Peeters, R van Es, J-W. Bosker, A. Zdravkov, E. Lenderink, F. Evangelista, P. Broman, B. Bilski, T. Last, “Enabling sub-10nm node lithography: presenting the NXE:3400B EUV scanner,” Proc. SPIE 10143, 101430D (2017).

[7] S-S. Yu, A. Yen, S-H. Chang, C-T. Shih, Y-C. Lu, J. Hu, T. Wu, “On the Extensibility of Extreme-UV Lithography,” Proc. SPIE 7969, 79693A (2011).

[8] https://www.linkedin.com/pulse/forbidden-pitch-combination-advanced-lithography-nodes-frederick-chen/

This article originally appeared in LinkedIn Pulse: The Stochastic Impact of Defocus in EUV Lithography

Related Lithography Posts


CEO Interview: John O’Donnel of yieldHUB

CEO Interview: John O’Donnel of yieldHUB
by Daniel Nenni on 06-26-2020 at 10:00 am

John ODonnell CEO 150

Let me introduce John O’Donnell, CEO of yieldHUB. After earning a degree in microelectronics John spent 18 years at Analog Devices before founding yieldHUB in 2005. If anybody knows yield it is Analog Devices having shipped billions upon billions of chips, absolutely.

SemiWiki will be digging deeper into the technology behind yieldHUB but first let’s talk to John.

What is yieldHUB?
yieldHUB is a leading semiconductor yield management provider. We work with Fabless and IDM companies worldwide. Founded in 2005, we’re celebrating 15 years in business this year.

What gap did you see in the market?
I saw a gap in the market in 2005 for web-based YMS (yield management software), where there should be no need to download data before being able to chart it and analyze it. Let the server do the work! We wanted to remove the hassle from engineers of always having to assemble disparate data for hours before ever getting to analyze and report on an engineering problem.

Why do you do what you do?
We want engineers to spend less time gathering data and more time solving problems. We give oversight to their managers, as they can see the data and reports their teams are working on.

We help companies increase their yield and reduce scrap to improve their margins. Our STDF analysis is very sophisticated and allows engineers and their managers to create excellent reports and drill down into what’s happening on the factory floor. One of our customers said that yieldHUB makes engineers 10 times more efficient!

What challenges did you have?
Early on in the journey, we pivoted to Real-Time analysis of the test floor. We knew how to do it without adding any hardware. But when we produced it, people weren’t willing to pay for it – we were probably ten years ahead of our time in that area. However, companies were willing to pay for a relational database and associated tools for historical analysis if they were fast and comprehensive enough. So we went back to fully concentrating on our original plan and were able to continue growing and developing.

What makes yieldHUB successful?
Having a powerful enduring vision of making powerful data analysis easy and speedy for engineers – and then hiring great people who believe in the vision and bring their own knowledge and experience to it! Most of our employees work remotely, this allows us to hire top talent around the world. Our team members and associates are based in Ireland, the USA, the UK, The Netherlands, The Philippines, Taiwan, South Korea and Japan. It allows us to serve our customers in their time zones.

We hire experienced technical people who, importantly, also have empathy for customers. Our salespeople are all former Test,  Product and Foundry Engineers. They listen and give great advice because they’ve been there.

What industries do you work with? 
Our customers are diverse. Several of our customers are start-ups that make chips for 5G. Others provide automotive chips for tier 1 car manufacturers. Some work in the aviation and military sectors. One of our largest customers works in consumer goods. We provide different services for different needs. With consumer goods the margins are tight – they have huge volumes and need very high yields. For automotive companies, quality, reliability and traceability are key. We enjoy working with our customers to identify their needs, then show how we can help them. Being at the forefront of new technology is exciting. Every day we’re helping to create the future.

What’s next for yieldHUB?
The fast-growing image sensor market is a big opportunity. We have developed exciting cloud-based image analysis software which is now in production and allows automated categorisation of images and defects.

Our automated data-cleansing capabilities lend themselves well to provide clean, linked data as inputs to machine learning, reducing errors in the predictions. Hence machine learning for yield improvement is also a growing focus for yieldHUB.

With our new API, you can now access clean manufacturing data from yieldHUB in real-time from other systems including from MES/ERP and financial systems. This also is looking very promising to help customers too, for example, reconcile invoices from their subcons with actual test time, volume and yield information from the hugely scalable and reliable yieldHUB database.

About yieldHUB 
yieldHUB supplies world-class data analysis solutions to the semiconductor industry. For companies who design and manufacture semiconductors, the cloud-based software provides a complete understanding of their product performance and yields. Visit yieldhub.com

Also Read:

CEO Interview: Deepak Kumar Tala of SmartDV

Fractal CEO Update 2020

CEO Interview: Johnny Shen of Alchip


Multi-Vt Device Offerings for Advanced Process Nodes

Multi-Vt Device Offerings for Advanced Process Nodes
by Tom Dillinger on 06-26-2020 at 6:00 am

Ion Ioff

Summary
As a result of extensive focus on the development of workfunction metal (WFM) deposition, lithography, and removal, both FinFET and gate-all-around (GAA) devices will offer a wide range of Vt levels for advanced process nodes below 7nm.

Introduction
Cell library and IP designers rely on the availability of nFET and pFET devices with a range of threshold voltages (Vt).  Optimization algorithms used in physical synthesis flows evaluate the power, performance, and area (PPA) of both cell “drive strength” (e.g., 1X, 2X, 4X-sized devices) and cell “Vt levels” (e.g., HVT, SVT, LVT) when selecting a specific instance to address timing, noise, and power constraints.  For example, a typical power optimization decision is to replace a cell instance with a higher Vt variant to reduce leakage power, if the timing path analysis margins allow (after detailed physical implementation).  The additional design constraints for multi-Vt cell library use are easily managed:  (1) the device Vt active area must meet (minimum) lithography area requirements, and (2) the percentage of low Vt cells used should be small, to keep leakage currents in check.

A common representation to illustrate the device Vt offerings in a particular process is to provide an I_on versus I_off characterization curve, as shown in the figure below.

Although it doesn’t reflect the process interconnect scaling options, this curve is also commonly used as a means of comparing different processes, as depicted in the figure.  A horizontal line shows the unloaded, I_on based performance gains achievable.  The vertical line illustrates the iso-performance leakage I_off power reduction between processes, for a reference-sized device in each.  Note that these lines are typically drawn without aligning to specific (nominal) Vt devices in the two process nodes.

The I_on versus I_off curve does not really represent the statistical variation in the process device Vt values.  A common model for representing this data is the Pelgrom equation. [1]  The standard deviation of (measured) device Vt data is plotted against (1 / sqrt(Weff * Lgate)):

(sigma_Vt)**2 =  (A**2) / 2 * Weff * Lgate 

       where A is a “fitting” constant for the process

Essentially , as the square root of the channel area of the device is increased, the sigma-Vt decreases.  (Consider N devices in parallel with independent Vt variation – the Vt mean of the total will be the mean of the Vt distribution, while the effective standard deviation is reduced.)  The Pelgrom plot for the technology is an indication of the achievable statistical process control – more on Vt variation shortly.

For planar CMOS technologies, Vt variants from the baseline device were fabricated using a (low impurity dose) implant into the channel region.  A rather straightforward Vt implant mask lithography step was used to open areas in the mask photoresist for the implant.  For an implant equivalent to the background substrate/well impurity type, the device Vt would be increased.  The introduction of an implant step modifying the background concentration would increase the Vt variation, as well.

With the introduction of FinFET channel devices, the precision and control of implant-based Vt adjusts became extremely difficult.  The alternative pursued for these advanced (high-K gate oxide, metal gate) process nodes is to utilize various gate materials, each with a different metal-to-oxide workfunction contact potential.

Vt offerings for advanced nodes
As device scaling continues, workfunction metal (WFM) engineering for Vt variants is faced with multiple challenges.  A presentation at the recent VLSI 2020 Symposium by TSMC elaborated upon these challenges, and highlighted a significant process enhancement to extent multi-Vt options for nodes below 7nm. [2]

The two principal factors that exacerbate the fabrication of device Vt’s at these nodes are shown in the figures below, from the TSMC presentation.

  • The scaling of the device gate length (shown in cross-section in the figure) requires that the WFM deposition into the trench be conformal in thickness, and be thoroughly removed from unwanted areas.
  • Overall process scaling requires aggressive reduction in the nFET to pFET active area spacing.  Lithographic misalignment and/or non-optimum WFM patterning may result in poor device characteristics – the figure above illustrates incomplete WFM coverage of the (fin and/or GAA) device.

Parenthetically, another concern with the transition to GAA device fabrication is the requirement to provide a conformal WFM layer on all side of each (horizontal) nanosheet, without “closing off” the gap between sheets.

The TSMC presentation emphasized the diverse requirements of HPC, AI, 5G comm., and mobile markets, which have different top priorities among the PPA tradeoffs.  As a result, despite the scaling challenges listed above, the demand for multi-Vt cell libraries and PPA optimization approaches remains strong.  TSMC presented extremely compelling results of their WFM fabrication engineering focus.  The figure below illustrates that TSMC has demonstrated a range of Vt offerings for sub-7nm nodes, wider than 7nm.  TSMC announced an overall target Vt range exceeding 250mV.  (Wow.)

In addition to the multi-Vt data, TSMC provided corresponding analysis results for the Vt variation (Pelgrom plot) and the time-dependent device breakdown (TDDB) reliability data – see the figures below.

The sigma-Vt Pelgrom coefficient is improved with the new WFM processing, approaching the 7nm node results.  The TDDB lifetime is also improved over the original WFM steps.

The markets driving the relentless progression to advanced process nodes have disparate performance, power, and area goals.  The utilization of multi-Vt device and cell library options has become an integral design implementation approach.  The innovative process development work at TSMC continues this design enablement feature, even extending this capability over the 7nm node – that’s pretty amazing.

For more information on TSMC’s advanced process nodes, please follow this link.

-chipguy

References
[1]  ] M. J. M. Pelgrom, C. J. Duinmaijer, and A. P. G. Welbers, “Matching properties of MOS transistors”, IEEE J. Solid-State Circuits, vol. 24, no. 5, pp. 1433–1440, Oct. 1989.

[2]  Chang, Vincent S., et al., “Enabling Multiple-Vt Device Scaling for CMOS Technology beyond 7nm Node”, VLSI Symposium 2020, Paper TC1.1.

Images supplied by the VLSI Symposium on Technology & Circuits 2020.

 


Nobody Ever Lost Their Job for Spending too Much on Hardware Verification, Did They?

Nobody Ever Lost Their Job for Spending too Much on Hardware Verification, Did They?
by Daniel Nenni on 06-25-2020 at 6:00 am

Silicon Bug Cost Scenario

A paper was published last month on the Acuerdo Consultancy Services website authored by Joe Convey of Acuerdo and Bryan Dickman of Valytic Consulting. Joe and Bryan spent combined decades in the Semi and EDA World which means they have a great understanding of hardware bugs first hand, absolutely.

Here is a quick summary with a link to get the full paper at the end:

Whether you are developing a hardware product or a software product or both, understanding bugs, what causes them, how to avoid them, the cost of finding them and the cost of not finding them, becomes one of the biggest drivers that shape how teams develop products. Understanding all aspects of this will help you to reason about the balance between delivered product quality and ROI; too much or too little?

Economics of Bugs
In IP hardware product development, bugs drive a large part of the development cost and getting the investment balance wrong will have a big impact. How do teams deliver on time, with functionality, performance and integrity and what are the costs involved in ensuring an absence of bugs and the delivery of high-quality products?

Getting the investment vs risk balance right is hard…

Nobody wins if a design team get it wrong. Market reality means that costs will often be passed down the customer/consumer food chain and eventually impact end-users. Bug escapes can drive up mitigation and development costs and subsequently drive down ROI and customer confidence.

How to get the balance between investing too little (with unacceptably high risk), versus “too much” (low risk, but lousy ROI) is a long-standing challenge that design teams and the CFO have to grapple with. Engineering usually win the investment battle with the CFO, because few in senior management challenge the assertion that “you can never spend too much on finding bugs”. This results in a spiraling investment in compute and EDA tools as designs get more complex. Sound familiar?

People are your most valuable asset
The success of the products comes down to the quality of the staff; how well trained they are; how experienced they are and how innovative they are; how well equipped they are with the ‘best-in-class’ tools and resources to do their jobs. Engineers love to solve challenging problems, innovate, build ‘cool’ things, be experts and craftsmen, have access to the latest and best-in-class platforms and tools and work with teams of talented colleagues. Do they really understand how much this costs?

Get engineering teams on-side by growing a shared understanding of engineering platform costs
An interesting challenge for any business developing complex products that consume costly tools and resources is how to educate engineering teams to be cost-conscious and use the available resources sparingly and effectively. Of course, engineers understand cost, but providing the teams with clear data showing the relationship between the cost of providing the engineering platform and the process of designing IP can create positive perspectives on how to create cost efficiencies;

Wow, did we really spend that much? Surely, we can do something about that by improving xyz?

A huge part of improving product ROI is the drive for design efficiencies. Sharing the business challenge can provide a very positive environment to encouraging innovations in methodologies and deployment of tools, at a cost profile that fits the company’s business plans and provides engineering best in class facilities.

Work with engineering to drive great partnerships with EDA and other platform vendors
Having engineering as a willing partner makes a difference to the success of tool evaluations and subsequent negotiations with vendors. Creating a culture of partnership that transcends internal barriers but also extends to the company’s EDA and IT suppliers is really valuable, as it encourages vendors to become “partners” working with engineering in a joint effort to confront the technical challenges involved in a process like IP verification and debug.

Compute, compute and even more compute…
We need more compute, otherwise there will be product delays and missed revenue targets… Over the last two decades the industry has seen the compute requirement for Verilog simulation grow by several orders of magnitude and the compute infrastructures have scaled up accordingly. Businesses are migrating compute capacities from on-prem to cloud, which offers much greater capacity agility and enables teams to reason better about cost versus time to delivery.

We’re going to spend more money over a shorter time and deliver earlier.

For many, this means operating a hybrid on-prem/cloud environment. EDA vendors are adapting their tools and their business models to cope with this new hybrid-cloud world.

For complex IP products such as processors, most of this simulation consumption is taken up by constrained-random verification strategies where the ability to consume cycles is open-ended.

But I ONLY want to run GOOD verification cycles!

So, when is a verification cycle a “good cycle”? Only if “Verification Progress” has been made. By that we mean that it has either found a new bug, or it has measurably increased the testing space i.e. demonstrated correctness. The former is easy to track (we can count bugs), but the latter is harder because we don’t know precisely how many bugs are present. Instead, we track endless metrics such as coverage and cycles of testing.

Extensive operational analytics are needed in order to operate a resilient and performant platform service delivering the appropriate QoS. These analytics will monitor system performance and capacities, track operational metrics over time and may exploit machine learning prediction algorithms to alert to pending failures so that mitigations can be deployed ahead of a critical failure point.

Conclusion
It boils down to understanding the total cost of bugs expressed as the costs expended in finding bugs, versus the cost of not finding bugs, i.e. what is the total impact cost when bugs are missed?

Which of the above 2 scenarios are you in? Do you have enough data to be able to reason about this and decide which scenario applies to your operations? If not, then what actions are you going to take to find out where you are and what action to take in each situation?

If your analysis shows that you are in scenario A, it looks like you need to address product quality urgently. You’re not really investing sufficiently in IP Product Verification and the impact costs are significantly reducing your products’ ROI. Scenario B certainly feels more comfortable, but you might have nagging doubts about the efficiency of your operations….

Now you have the right information to make the appropriate investments and get the balance right – and by the way, don’t lose your job!

Get the full paper “On the Cost of Bugs”.


Key Semiconductor Conferences go Virtual

Key Semiconductor Conferences go Virtual
by Scotten Jones on 06-24-2020 at 2:00 pm

IEDM 2020 Logo

This last week the 2020 Symposia on VLSI Technology and Circuits (VLSI Conference) was held as a virtual conference for the first time and it was announced today (June 24th) that this year’s IEDM conference will also be held as a virtual conference.

“The IEDM Executive Committee has decided that in the interest of prioritizing the health and safety of the scientific community, a virtual approach is the best option for this year,” said Dina Triyoso, IEDM 2020 Publicity Chair and Technologist at TEL Technology Center, America, LLC.

I attended the virtual VLSI Conference last week and I thought it would be interesting to discuss my experience with the conference and what it means for IEDM.

The first and perhaps most obvious aspect of a virtual conference is you do not have to travel to attend it saving travel expenses and time away from the office. Of course, the question then becomes what is the experience like?

In general, I really liked the experience, in fact I would say that in many ways I prefer the virtual conference.

The advantages of a virtual conference:

  • No travel, saving time and money.
  • At large conferences there are often parallel tracks with multiple papers being presented at the same time. It is not uncommon to have two or even more papers I would like to see presented at the same time. Conversely there will sometimes be gaps where there are no papers of interest being presented. For the virtual conference, the papers were live streamed and recorded. I exclusively accessed the conference through the recordings and really likes that I could pick when I wanted to watch different presentations and never had to choose one over the other because of scheduling conflicts.
  • When you are at a conference watching a paper presented, you are often sitting in a tightly packed auditorium trying to take notes on a computer in your lap and the presenter is trying to fit in as much information as possible in their time slot. It can be difficult to capture all the information. Sitting at my computer watching a video that I can pause, and rewind is a much better experience.
  • At some conferences, such as the SPIE Advanced Lithography Conference there are essentially no proceedings, at other conferences such as VLSI and IEDM the papers are published in a proceeding, but the presentations are typically not made available. The presentations often have a lot more figures than the paper and are a valuable source of additional information. At the virtual VLSI Conference as you watched each presentation you had the option of downloading the paper and the presentation.

There are some disadvantages to a virtual conference and there were some execution issues with the VLSI Conference:

  • One important part of technical conferences is the opportunity to have private conversations with colleagues and to network, that is missing from a virtual conference.
  • I heard there were some streaming issues with the conference and one of the recorded presentations I watched had the sound fading out periodically throughout the presentation. These are the kind of technical issues I would expect would get worked out with more experience with virtual conferences.
  • You had to sign in to each session and then if you want to switch to a new session sign in again, it would have been easier if you could just sign in and then navigate between sessions but this was a minor inconvenience.
  • My biggest issues with the virtual conference was the over one hundred reminders emails I was sent. It seems that each session sent a reminder in advance and if you didn’t watch the live stream you got a sorry you missed it email as well. I complained about this to the committee and was told it was a feature of the software that couldn’t be turned off (and there is no opt out link on the emails either). This needs to be fixed.

Overall, I thought it was a good experience. I even suggested to the conference that since everything is recorded, they could leave registration open for a few months and people could sign up and watch the conference and download the papers, this could potentially be a way to expand the revenue stream for the conference.

I will miss networking at IEDM this year, it is an important conference for me in that respect, but I am looking forward to a better experience accessing the technical conference through a virtual conference.

IEEE International Electron Devices Meeting (IEDM) is the world’s preeminent forum for reporting technological breakthroughs in the areas of semiconductor and electronic device technology, design, manufacturing, physics, and modeling. IEDM is the flagship conference for nanometer-scale CMOS transistor technology, advanced memory, displays, sensors, MEMS devices, novel quantum and nano-scale devices and phenomenology, optoelectronics, devices for power and energy harvesting, high-speed devices, as well as process technology and device modeling and simulation.

Also Read:

Effect of Design on Transistor Density

Cost Analysis of the Proposed TSMC US Fab

Can TSMC Maintain Their Process Technology Lead


Apple’s Silicon Switch Changes Game & Balance 

Apple’s Silicon Switch Changes Game & Balance 
by Robert Maire on 06-24-2020 at 10:00 am

Intel Apple Silicon

Will/should others follow?
TSMC vs Intel impact?
Moving Apple’s supply chain further overseas

Apples move to self served silicon was no surprise…..
It has been speculated for years and we have talked about it many times.  It makes more sense for Apple to have silicon, custom designed for their applications and products that fit exactly in their line up. Rather than use an “adapted” X86 architecture that harkens back and pays homage through compatibility to the earliest Intel CPUs, Apple can finally have a “purpose built” CPU that fulfills all its needs.

Not to mention the fact that what really sealed the deal and perhaps accelerated the need was that TSMC had passed Intel in the Moore’s law race.  Not jumping on the TSMC bandwagon would limit Apple to underperformance as compared to what is available.

Apple can gain further differentiation in the marketplace as compared to other laptop makers who really can’t differentiate themselves as they all use the same engine, Intel.

Not the first time Apple switches CPUs…
Apple has changed CPUs several times over the years as the industry has moved forward.  The change from Intel is just another sign that the industry has moved on.

Apple started, way back when, on its Apple II, with a MOS 6502 , 8 bit, CPU which was a much cheaper, better copy of the Motorola 6800 cpu and also way cheaper than Intel’s 8080. The 6502 was used in the Atari 2600 game console and Commodore consumer computer, so cost was a big factor.

The jump to Apple Macintosh also saw a jump to the Motorola 68000 CPU a 16/32 bit design.

Later on down the road, Apple switched again to the PowerPC CPU by IBM which was a RISC (reduced instruction set) CPU versus other popular CISC (Complex Instruction Set) CPUs at the time.

As Apple had its own OS and own infrastructure, X86 compatibility was not as much an issues and perhaps Apple’s “Think Different” mind set helped it go its own way.

Then back in 2005 Apple announced that it had cut a deal with Intel to move to Intel’s X86 line.  We are sure Apple goy a good deal from Intel for the switch and the PowerPC was already on its way out so Apple was jumping ship just at the right time.

Obviously Intel at the time was the CPU powerhouse and had performance that shut out everyone else due to its Moore’s law lead.

Intel’s misstep’s and slowness at entering the mobile CPU market was perhaps the beginning of the end of the relationship as Apple went its own way with ARM based CPUs that morphed into fully custom , purpose built CPUs.

Apple has spent years building up its CPU expertise by acquiring many silicon companies, pouring tons of money into R&D , hiring the best and brightest like Jim Keller, the CPU guru of Apple, AMD, Tesla and Intel.

It obviously makes more sense for Apple to similar CPUs across all its devices for a compatible, seamless product line.

The final handwriting was on the wall as TSMC seemed to pass Intel in terms of transistor density and power consumption characteristics.

Apple is also a huge company as compared to Intel, it has the critical mass, and certainly no longer needs to live within the confines of an Intel dictated architecture that suits Intels needs (and profits).

If anything, we are surprised that this didn’t happen a lot sooner

Collateral Impact, shifts supply chain further to Asia
The move obviously means that TSMC will get a lot more business.  TSMC already makes all of AMDs products that matter, many of Intels products and already makes all of Apples Iphone, Ipad, Iwatch chips.  TSMC is becoming ever more critical as the key, central linchpin to the entire US technology industry.  It is clearly a single point of failure located a short boat ride from China.

This obviously doesn’t jive well with recent problems with Huawei and makes the “token” TSMC fab proposed for Arizona look even more inadequate than before.

If anything the move by Apple further focuses things on the TSMC single point of failure to the US technology industry.

Intel impact
Apple is a large but not too large a customer.  We view the loss as expected and is more of a psychological loss than a financial numerical loss.  Losing the hottest customer in the market is obviously an embarrassment and further proof of the need of Intel to double down to regain its position in Moore’s Law.

It also says that Intel is not competitive for mobile, power sensitive applications but is better off in the data center where power consumption matters less.

Intel has been making most of its money in the data center anyway but it would be better to not lose the diversification.

Should/will others follow suit?
An interesting question now is whether other Laptop/ consumer PC makers will try to follow Apple’s lead and use custom or ARM like processors? The obvious limitation is Microsoft and Windows 10 which powers the rest of the world. Would Microsoft abandon the ancient Wintel duopoly and build a more portable Windows 11? (or whatever number)

We think that Microsoft has to be wondering if they are tied to a sinking ship.  If Apple demonstrates significant power/performance benefits by leaving Intel then it will pick up more market share, which means Microsoft will lose share.

It seems like it would at least be a cheap insurance policy for Microsoft to develop ARM like compatibility to hedge against its potential success.

Having one company, Apple, with application transportability across smart phones, tablets and laptops and wearables all with the same underlying CPU architecture will be huge. Microsoft flopped in smart phones, is lame in tablets and no where in wearables. If I were Microsoft I would be thinking hard about being tied to Intel which will be relegated to the data center only with free Linux as an alternative.

Microsoft already demonstrated PowerPoint at Apple’s roll out and we are sure the full Microsoft suite will move to Apples architecture.

What do PC makers such as Dell , Lenovo, HP and others do?

We think this is a very open ended question that begs answering.  The wrong thing to do is clear….don’t sit around continuing to do the same thing that you have done for the past ten years.

Could Apple become a chip maker?
Apple could very easily make, using TSMC, and sell a version of their CPU architecture to other hardware manufacturers.

Maybe they could sell a version not quite as capable as their own but still better in performance/power than the Intel/AMD alternatives.

They could probably sell it at a pretty good margin and give Intel and AMD a run for their money as the design would likely be better for laptops and portable applications. The bonus would be that Microsoft applications are already compatible with it.

Maybe Google would love it for a Chromebook application and get Microsoft apps to boot. Apple would not cannibalize its own sales as it would still be the only company offering the same architecture from wearables to laptops but it could create more critical mass for more applications to be ported (not that there isn’t enough demand already).

The idea of Apple selling chips is not at all that far fetched if they turn out to be that much better than Intel/AMD.

How many years is “many years to come”?
Tim Cook said that Apple will support X86 devices compatibility for “many years to come”.  In our view that could be as few as two years (the plural of year-many just means more than one….).

When Apple switched from PowerPC their support ended in 3 years after which PowerPC based devices became paperweights.

We think Apple will dump Intel as fast as possible.  Its great for Apple as they get to sell a lot of new laptops at better margins.

For me as a consumer, I will run out and buy a new Apple CPU based laptop as soon as they are available as I would love to have application transportability across my Iphone/Ipad/Iwatch.  I think this clearly expands the addressable market for Apple laptops as many people would switch from windows to get that compatibility.

It will be a seismic change for not just Apple.

The Stocks
There is zero near term impact but just more things to track going forward to watch the transition play out.  Apple has had a lot of time to plan this and won’t screw it up.  It will likely be faster/better than expected.  It is broadly, long term positive for Apple and broadly long term negative for Intel.

It does not impact chip equipment in that it is a zero sum game.  It does obviously benefit TSMC who gains even more leverage and dominance in the market.

We wonder when the administration and legislators will pick up on this acceleration of outsourcing to Asia.

Semiconductor Advisors

Semiconductor Advisors on SemiWiki


Why Go Custom in AI Accelerators, Revisited

Why Go Custom in AI Accelerators, Revisited
by Bernard Murphy on 06-24-2020 at 6:00 am

frame interpolation

I believe I asked this question a year or two ago and answered it for the absolute bleeding edge of datacenter performance – Google TPU and the like. Those hyperscalars (Google, Amazon, Microsoft, Baidu, Alibaba, etc) who want to do on-the-fly recognition in pictures so they can tag friends in photos, do almost real-time machine translation, and many other applications. But who else cares? I’ve covered a couple of Mentor events on using Catapult HLS to build custom accelerators. Fascinating stuff and good insights to the methods and benefits, but I wanted to know more about what kind of applications are using this technology.

I talked to the Catapult group to get some answers: Mike Fingeroff (technologist for Catapult), Russ Klein (Product Marketing for Catapult) and Anoop Saha (Senior manager, strategy and Biz Dev for machine learning and 5G).

Video Interpolation

Anoop talked about one very cool application – video frame interpolation. You take a video at some relatively low number of frames per second, say 20 fps, but maybe you want to play it back on a 60fps display. Maybe you also want to replay in slow-motion. In either case you have gaps between frames which must be filled in somehow if you don’t want a jumpy replay. The simple answer is to average between frames. But that’s pretty low quality – it looks flickery and unnatural. A much better approach today is AI-based. Train a system with (many) before and after frames to learn how to much more smoothly and more naturally interpolate. The results can be quite stunning.

5G

Anoop added that generally, any case where you have to respond to serious upstream bandwidth and be able to make near real-time decisions to influence downstream behavior, you’re going to need custom solutions to meet that kind of performance. For example, Qualcomm talks about how AI in the 5G network will help with user localization, efficient scheduling and RRU utilization, self-organizing networks and more intelligent security, much of which demands fast response to high volume loads.

Video doorbell

Russ talked about his Ring doorbell. He doesn’t want the doorbell to go off at 3am because it detected a cat nearby. He wants accurate detection at a good inference rate, but it has to be very low power because the doorbell may be running on a battery. I could imagine a similar point being made for an intelligent security system. The movie trope of detectives fast forwarding through hours of CCTV video may soon be over. A remote camera shouldn’t upload video unless it sees something significant, because uploads burn power at the camera and because who wants to scroll through hours of nothing interesting happening?

The advantage of HLS for custom AI accelerators

Fair points, but why not run this stuff on a standard AI accelerator? The Catapult team told me that their customers still see enough opportunity in the rapidly evolving range of possible AI architectures to justify differentiation in power, performance and cost through custom solutions. AI accelerators haven’t yet boiled down to a few standard solutions that will satisfy all needs. Perhaps they never will. A custom solution is even more attractive when you can prototype a system in an FPGA, refine it and prove it out, before switching to an ASIC implementation when the volume opportunity becomes clear.

Russ wrapped up by adding that algorithms are the starting point point for all these evolving AI solutions, which make them natural fit with HLS. Put that together with HLS ability to incrementally refine implementation architecture to squeeze out the best PPA (as Russ showed in an earlier webinar I blogged). Further add HLS ability to support system verification in C against very large data sets (video, 5G streams, etc). Put that all together and Russ sees the combination continuing to reinforce interest in the Catapult solution. Difficult to argue with that.

You can learn more about Catapult HERE.