SemiWiki – Page 388 – The Open Forum for Semiconductor Professionals

July 2, 2020July 18, 2025

What’s New in Verdi? Faster Debug

What’s New in Verdi? Faster Debug
by Bernard Murphy on 07-02-2020 at 6:00 am
Categories: EDA, Synopsys

Want fast debug? Synopsys recently hosted a Webinar to show off the latest and greatest improvements to Verdi^® in performance, memory demand and multi-tasking, among other areas.

Performance improvements
Taruna Reddy (PMM) and Allen Hsieh (Staff apps) presented features of the latest version, released in March – Taruna started by talking about the benefits that can be found in a tight integration between the simulator (VCS) and the debugger. This shows up first in compile-time performance – only one compile needed for both. Synopsys has also added tight integration for dynamic aliasing in the databases. Dynamic aliasing works on a common expectation that some signals may have the same waveform through ~90% of a run. A good example would be for clocks. These can be aliased into one signal and intelligently retrieved in debug. Taruna said that this can show ~3.5X reduction in FSDB size and 1.5X improvement in performance. Synopsys has also been able to squeeze out up to 2X improvement in runtime for transaction dumping through native DPI integration.

A question came up at the end – can third party simulators benefit from these improvements? Allen stressed first that Verdi continues to actively support other simulators. However, they won’t get the benefit of these improvements because they require tight integration in the simulator as well as in the debugger.

Taruna also mentioned that, again through further tight integration, they have been able to reduce callback overhead between the simulator and the debugger, giving another 1.2X in performance, and they now enable you to configure dumping to run through multiple threads, offering yet another 1.2X in performance boost.

Incremental FSDB loading
Allen talked about another class of optimizations: smart loading and pre-loading DBs into Verdi. Smart load works well on large designs with FSDBs that have good hierarchical division in signals. A smart load will load only the first touched scope in a debug session until it has to load more. Allen said they have seen >10X reduced load time and 10X reduced load memory in such cases. He anticipated an obvious question – what efficiency hit do you take if you need to go outside that scope in debug? He showed an example in which he traced 3 drivers outside an initially loaded scope. At least for that example, he still saw a 6X improvement in performance over starting with a full load.

An obvious question on this topic came up in Q&A – are there cases in which smart load doesn’t work well? Allen admitted that for gate-level sims with very little hierarchy, you probably won’t see any advantage.

Verdi also supports user-defined loading – you define which scopes you want to load and can pull in more only if needed.

Other improvements
Allen mentioned a couple more useful features. First, Verdi now supports multi-tasking debug. You can now launch long-running tasks, such as driver-tracing, in separate tasks and continue to debug while those are running. There’s also a capability to pause or cancel background tasks.

Verdi has added a unified “Find” in this release – one Find window to rule them all, rather than separate Find windows with not always consistent capabilities. As in any other windowing environment, Find works on whatever window you have selected.

Allen wrapped up with a discussion on some other features which were available in earlier releases but are perhaps still new to some of you. He talked in particular about Reverse Interactive Debug and Constraint Debug. He also provided an overview of consistency in Verdi Debug interface across the entire Synopsys Verification product line.

You can watch the Webinar HERE.

Also Read:

Design Technology Co-Optimization (DTCO) for sub-5nm Process Nodes

Webinar: Optimize SoC Glitch Power with Accurate Analysis from RTL to Signoff

The Problem with Reset Domain Crossings

July 1, 2020August 12, 2020

Killer Drones to be Available on the Global Arms Markets

Killer Drones to be Available on the Global Arms Markets
by Matthew Rosenquist on 07-01-2020 at 10:00 am
Categories: Security
13 Comments

Turkey may be the first customer for the Kargu series of weaponized suicide drones specifically developed for military use. These semi-autonomous devices have been in development since 2017 and will eventually be upgraded to operate collectively as an autonomous swarm to conduct mass synchronized attacks.

This situation has been building for some time and I have been ringing the warning bell for years. Sadly, this is just the beginning of the development arc for these types of weapon systems. As better sensors, enhanced range, greater speed, cleverer AI, and greater payloads become available, we will see all manner of new usages and specializations.

Back when the airplane was first developed and used in WWI, they started as reconnaissance platforms, replacing the very limited and vulnerable dirigibles. Once they shifted to an offensive role, bombing and strafing ground targets, the interceptors emerged to counter the threat. By WWII, we had a massive range of different specialized aircraft for air superiority, interdiction, strategic bombing, and defense which evolved so fast they were unrecognizable as compared to their WWI origins. We are faced with the same future when it comes to autonomous drones.

Imagine the next generation minefield where drones lay dormant until sensors detect a target then pop up and pursue. How about slaughter-bot variants, that are programmed to target specific groups of people and work as part of a mesh network to saturate an area with hunter behaviors. Such weapons could redefine guerrilla and low-intensity warfare. Forget about buried improvised explosive devices (IED), which have been the bane of coalition forces over the past few years. Those were deployed by attackers with the hope a target would happen to wander by and come close enough to be attacked. These drones will be able to aggressively seek out adversaries, structures, or innocent civilians at range with little to no exposure of the operator.

Name any nation or warlord that would not embrace such cheap and replaceable devices.

The defensive technologies to protect against such attacks are still in their nascent phases. Traditional defenses are at a distinct disadvantage. There is much that must be done to establish capabilities, oversight, and limitations that restricts abusive and undesired use of these types of munitions in conflicts that could span the globe.

These aren’t the only drones under development or in use. But the low cost, small size, single-operator design, swarm design goals, and payload suited to attack people makes for an unnerving combination. As the world’s inventories expand to include weaponized autonomous drones, the need for proper cybersecurity will also increase.

I have warned governments in the past. They must be sure they have an antidote ready before releasing innovative weapons to the world. That includes viruses, drones, hacking suites, and AI sub-systems that could potentially be weaponized. The rush to deploy new toys often backfires. Adversaries may use the technology and tactics against those who introduced it, their allies, or innocent civilians. Without possessing the proper means of protection, giving the world a new weapon is just asking for trouble.

July 1, 2020August 22, 2024

Contact over Active Gate Process Requirements for 5G

Contact over Active Gate Process Requirements for 5G
by Tom Dillinger on 07-01-2020 at 6:00 am
Categories: 5G, Events, FinFET, GlobalFoundries, Semiconductor

Summary
A recent process enhancement in advanced nodes is to support the fabrication of contacts directly on the active gate area of a device. At the recent VLSI 2020 Symposium, the critical advantages of this capability were highlighted, specifically in the context of the behavior of RF CMOS devices needed for 5G designs.

Introduction
As shown in the left-hand figure below, the “conventional” layout design method is to place the contact for a logic gate input in the area between the (nMOS and pMOS) devices, leveraging the common connection between the two devices in a CMOS circuit. For logic cells with a small number of FinFET devices, the parasitic resistance of the metal gate to the active device channel is relatively small, even with scaling of the gate length and thickness, which defines the resistance cross-section.

However, for high-current devices used in RF circuits, with many parallel FinFET’s (e.g., ~40), a connection to the gate at one or both ends of the active area will result in significant resistance. A contact-over-active-gate (COAG) process step is required, as illustrated in the right-hand side of the figure above.

Specifically, for devices used in RF circuits, the common figure of merit (FOM) is fmax, which represents the frequency at which the biased device behavior falls to unity power gain. The greater the fmax, the greater the realizable power gain at the mmWave frequencies corresponding to 5G cellular communications.

A small-signal circuit model for the device is shown in the figure below, with an equation for fmax. Note that fmax is closely related to another FOM, ft, which represents the unity current gain frequency; a small-signal model and a relation for ft are also shown in the figure.

Additional items of note in the figure above include:

Increasing the small-signal device transconductance, gm, increases ft and fmax. The transition from planar RF CMOS (e.g., 28nm) to FinFET technologies offers improved device gain. The figure below highlights the motivation to adopting advanced FinFET technology for RF applications. (Gmsat represents the transconductance gain for the device biased in the saturation region of operation.)
The lower the parasitic Rgate, the higher the fmax. Reducing Rgate is the key focus of introducing the COAG process.

Another critical FOM for RF CMOS technology is the “noise figure”. Each RF device in an amplifier or receiver chain introduces noise to the baseline input signal. In addition to the noise sources in the device channel (e.g., thermal, flicker), the Rgate parasitic element is also a thermal noise source. A minimal noise factor (measured in dB) is ideal – more on COAG device noise analysis shortly.

COAG analysis and reliability for 5G

The figure below depicts the interrelated design considerations for RF CMOS, as represented by a LNA circuit topology.

At VLSI 2020, a team from GLOBALFOUNDRIES presented a thorough silicon-based analysis of the benefits of COAG on FinFET device performance for RF applications. [1]

The data for Rgate versus the number of fins is given below, for the traditional and COAG layout style. For many parallel fin devices, multiple COAG contacts are used. The substantial improvement in the fmax FOM for the COAG device is also shown below – note how the fmax for the traditional FinFET layout gate contact degrades rapidly with larger devices (# of fins).

(Note that BSIM models for FinFETs utilize a consolidated parasitic model for Rgate for many fin devices – I would encourage you to review the Rgate with NFIN model assumptions at the UC-Berkeley BSIM web site.)

The improvement in the NF50 for a 40-fin COAG device is shown below (a common source amplifier topology) – a 3dB noise reduction has a huge impact on RF circuit design. The GLOBALFOUNDRIES team also presented data isolating the Rgate noise, demonstrating that it may indeed be a significant contributor to the overall NF50 – the COAG configuration is a key factor in improving the noise factor.

A concern with the process introduction of COAG would be the potential reliability impact of the contact and metal deposition/patterning steps directly over the gate and device oxide layers. The GLOBALFOUNDRIES team also presented TDDB reliability data for the COAG technology. Using a gate leakage current threshold measurement as the breakdown criterion, the dielectric lifetime was unaffected by the COAG process, as illustrated in the cumulative probability graph below.

The availability of COAG fabrication will undoubtedly introduce new opportunities for RF CMOS design optimizations using very wide (high fin count) devices. For more information on the GLOBALFOUNDRIES 12nm FinFET process, please follow this link.

-chipguy

References

[1] Razavieh, A., et al, “FinFET with Contact over Active-Gate for 5G Ultra-Wideband Applications”, VLSI 2020 Symposium, paper JFS2.5.

Also Read:

Embedded MRAM for High-Performance Applications

Webinar on eNVM Choices at 28nm and below by Globalfoundries

GLOBALFOUNDRIES Sets a New Bar for Advanced Non-Volatile Memory Technology

June 30, 2020July 6, 2020

A Vibrant Semiconductor Manufacturing Model for the US

A Vibrant Semiconductor Manufacturing Model for the US
by Scott Jewler on 06-30-2020 at 10:00 am
Categories: Semiconductor
7 Comments

Having spent the last 30 years in semiconductor manufacturing, eight years of this living and working in Asia, it is both exciting and unsettling to see renewed political interest in the revitalization of this industry in the United States. Gone are the days of ‘It doesn’t make any difference whether a country makes computer chips or potato chips!’ usually attributed to Michael J. Boskin, who served on President George H.W. Bush’s economic council. Chips of the computer variety are now a national security and economic priority.

But the successful return of the US to semiconductor manufacturing prominence is by no means a sure bet. Bipartisan support for the CHIPS for America Act is highly encouraging, but funding alone may not solve the systemic issues that have driven the disproportionate growth of overseas manufacturing in the semiconductor industry.

The Semiconductor Value Chain
Six of the top ten semiconductor companies in 2018 had US headquarters. These are Intel, Micron, Broadcom, Qualcomm, Texas Instruments (TI), and nVidia. Intel, Micron, and TI are Integrated Device Manufacturers, or IDM’s. This means that they produce the majority of their products in factories that they own and operate themselves. These factories may be located in the United States or abroad but are typically a combination of both. Broadcom, nVidia and Qualcomm are fabless semiconductor companies. They design and market semiconductor chips but rely on wafer foundries and Outsourced Semiconductor Assembly and Test (OSAT) service providers to manufacture their designs.

Intel produces CPU and GPU devices. Micron produces memory devices. These are both high-volume relatively low mix products that demand continuous investment in capital assets to increase performance. TI primarily produces analog and embedded processor devices. These devices are less capital intensive because product life cycles are longer and new designs can be implemented without replacing entire manufacturing lines.

Broadcom, Qualcomm, and nVidia are a different breed of semiconductor company known as fabless suppliers. They don’t own or operate their own manufacturing facilities. They source integrated circuits in wafer form from merchant suppliers known as wafer foundries and have these chips diced, packaged, and tested OSAT companies. While Globalfoundries and Samsung have wafer fabrication facilities in the US and TSMC has announced plans to build a facility in Arizona, the vast majority of fabless semiconductor manufacturing is done in Asia. There are no large OSAT factories in the US. Semiconductor packaging and test is done predominantly in Asia.

As shown in Figure 1 below, the fabless segment of the global semiconductor industry has grown from 7% of the total industry in 1999 to 30% in 2019. This represents a 13% compounded annual growth rate versus 4% for the IDM segment of the market. The volatility of the IDM portion of the market is also noticeably higher. This is primarily driven by the large revenue contribution of the memory devices and the fluctuation in the pricing of these products that results from frequent cycles of under and over supply conditions as competitors seek to generate cash to offset the large capital expenditures required to keep their factories at the leading edge.

Figure 1 (sourced from Statistica)

Intel, Micron, and TI all produce a significant portion of their semiconductor wafers in the United states but for the most part they ship these wafers to Asia for package assembly and final test. Why is this?

Package assembly and test moved to Asia beginning in the late 1960’s. At this time, these operations were highly manual and moving to Asia offered immediate labor costs savings. Times have changed though. Modern assembly process tools are now fully automated and direct labor typically represents only than 10% to 15% of manufacturing cost. While not as capital intensive as leading-edge wafer fabrication, package assembly and test does require continuous investment to support the higher levels of functional integration found in portable devices such as mobile phones as well as high-performance computing for cloud processing.

The OSAT business is highly competitive and gross margins are typically in the range of 20%. Asian manufactures have spent the last 50 years figuring out how to run these very lean businesses. It is difficult to make money in this business. Capital investments must be made without firm order volume. Larger OSAT’s run thousands of different part numbers in their giant factories at the same time. New product introductions are released continuously. Production ramps for hot new consumer products can be incredibly fast going from engineering level production to millions of units per week in less than a month. It is not a business for the faint of heart.

Why is the Merchant Supply Chain for Semiconductors Critical?
While the OSAT industry’s initial move to Asia was to reduce labor costs, the wafer foundry industry’s geographical concentration in Asia has a different history. As the cost of building a leading-edge wafer fab increased from a few hundred million dollars to over twelve billion dollars today, fewer companies had the financial resources to develop their own manufacturing technology and construct their own fabs. Companies with a dominant market position in a specific family of devices with predictable market demand could make these investments but smaller more specialized companies could not. By combining business from many smaller fabless design companies into a common factory and facilitating the ecosystem through internally developed and third-party IP blocks, TSMC created a unique solution that enabled the tremendous growth of the fabless segment of the market. Now many of the largest device companies in the world use wafer foundries and OSAT’s to do all their manufacturing.

This model benefits end-users as well. System designers can work with fabless suppliers to source chips without needing to reach the economic scale to support a dedicated factory. More design companies increase the variety of available chips and better align designs to a large variety of end use case.

This situation also creates a dilemma for the US defense industry whose volumes are not typically large but often require leading-edge manufacturing solutions.

What do Manufacturers in Taiwan know that US Manufacturers don’t?
Taiwan is now clearly the leader in semiconductor manufacturing with the worlds’ largest wafer fab (TSMC) and OSAT (ASE) headquartered there. Both companies do most of their manufacturing in Taiwan as well and have established highly competitive practices and a highly efficient ecosystem to keep their facilities running in a reliable and cost-effective manner.

Wafer fabrication can consist of more than 2000 process steps at the leading edge. To produce a device that functions properly, each of these process steps must be precisely controlled. While historically the packaging portion of the manufacturing process has been far less complex, increases in the functional density of end products such as smart phones and performance requirements of cloud computing have pushed packaging technology advances rapidly in recent years.

When many different products are built in the same line as in the case in wafer foundry or OSAT, the challenges intensify immensely. Product Lifecycle Management (PLM) and New Product Introduction (NPI) processes must be rigorously controlled. New products are often run on a single set of tools under engineering supervision. It can be months between the initial qualification of a new device and a subsequent ramp to high volume manufacturing. These ramps can be sudden and manufacturers must make sure that process recipes developed during NPI are followed precisely. The cost of a delay in the ramp of a new product can cause massive losses in revenue and market share for customers. Driven by a continuous flow of new products, merchant manufacturers in Taiwan have been very successful at developing their PLM and NPI processes. While these techniques can certainly be developed in other regions, the institutional knowledge these organizations have gained over decades of managing these complex requirements are invaluable and create a significant barrier to entry.

Manufacturers in Taiwan manage these complex process flows and PLM and NPI requirements while maintaining an unrelenting focus on costs. This pressure has created a large and complex ecosystem of smaller suppliers in Taiwan who make replacement parts and consumables at considerably lower prices than the Original Equipment Manufacturers (OEM’s). These suppliers compete relentlessly against each other while in turn driving down their own costs and raising productivity and quality. Over time, more and more complex components have been sourced from this domestic market saving Taiwan’s semiconductor manufacturers hundreds of millions of dollars on an annual basis.

What can the US Government and US Companies do to Create a Vibrant Domestic Semiconductor Manufacturing Industry
Passage of the CHIPS for America act is a vital first step, however it is important that the money be used in a way that promotes development of a sustainable domestic manufacturing ecosystem. Simply offsetting the existing cost differential between US and Asia manufacturing will have a temporary impact at best. The systemic differences between these markets must be addressed to ensure a long-term successful transformation of domestic semiconductor manufacturing.

Intense focus must be placed on understanding the root causes of the current imbalance between US and Asia manufacturing and funds directed in a way that overcomes these causes. The US should seek to create an ecosystem to support domestic merchant manufacturing that will enable fabless semiconductor companies to build their leading-edge products cost effectively and reliably in the US in domestic foundries and OSAT’s. This will provide the most benefits to both the commercial and defense industries.

A few specific actions are required to make this achievable.

First, eliminate the tax incentive to manufacture overseas. This is a no brainer. While fixing the loopholes that allow semiconductor companies who manufacture overseas to pay less tax seems attractive, the impact of such a decision needs to be weighed against the realities of the global competitive environment. Raising costs for US devices companies through higher taxes will benefit their international competitors. Better yet, allow domestic manufactures to enjoy the same tax benefits they see manufacturing overseas when building parts domestically.

Second, address the gaps in domain knowledge between US and Taiwan manufacturers. US IDM’s and foundries are not necessarily the experts on operating high volume, low cost foundries. There are no large US OSAT’s. Domestic manufacturing models and business processes have not developed in the same way as Taiwan over the last 20 years. The international transfer of manufacturing domain knowledge has fueled international growth in many industries. In the past, much of this domain knowledge transfer was from the US to Asia. In this case, the opposite is needed.

Third, build an entire ecosystem for semiconductor manufacturing and encourage private investment in the same. Scale is very important in wafer fabrication and packaging and test but a diverse ecosystem of materials, spare parts, and consumable materials are also necessary to achieve cost parity. Subsidize smaller manufacturers and machine shops to invest in the tools and development activities needed to support the semiconductor manufacturing industry. Make sure that third-party IP developers have incentive to make designs using domestic foundry design rules.

Forth, make sure manufacturers feel the competition and develop the ability to compete. This will not happen overnight but needs to be the end goal. Create incentives for manufacturers to operate with a dire sense of urgency. Make sure they ‘sweat the assets’ by pushing their capital asset productivity to at least the levels currently achievable in Taiwan. Give them aggressive but achievable cost targets to drive them to global competitiveness so that when government funding stops, they can compete in a global market.

Fifth, keep tight track of the CHIPS for America money and how it is used. It is surprisingly easy to destroy billions of dollars of capital in the semiconductor industry. Make sure end users have incentives to invest time and money in the qualification of domestic suppliers. Track progress and make sure that domestic manufacturers make continuous progress of yield, quality, cycle time, and cost. They won’t close the gap immediately, but they should be able to make continuous progress.

Conclusions
Semiconductor devices enable our interconnected world. While the US is a leader in semiconductor design, manufacturing equipment, and process technology, it lacks a vibrant semiconductor manufacturing sector, particularly for the vital fabless semiconductor segment of the industry. Recent events have prompted renewed public interest in a revitalized domestic semiconductor manufacturing industry. Public money can help promote the industry but money alone without proper allocation, management, and focus will not resolve the systemic issues that currently limit the ability of private enterprise to profitably compete in this market.

June 30, 2020March 16, 2022

Qualcomm on Power Estimation, Optimizing for Gaming on Mobile GPUs

Qualcomm on Power Estimation, Optimizing for Gaming on Mobile GPUs
by Bernard Murphy on 06-30-2020 at 6:00 am
Categories: Ansys, Inc., EDA, Mobile

I don’t look at the RTL power estimation topic too often these days, so I was interested to see that ANSYS still has a very strong position in this area. Qualcomm is using PowerArtist on one of the most demanding modern applications – mobile GPU power gaming. Mobile gaming heavily loads the GPU, so any optimization in that area will affect battery life. This is a world-class test because it’s not just ‘more of the same but bigger’. Gaming benchmarks are really going to stretch the range for that ever-present challenge in power estimation. bridging the gap between system-level use-cases and RTL-level power calculations.

There’s so much complexity in modern GPUs that averaged power estimates across relatively simple directed tests fall short. These are simply not going to be good enough to drive intelligent optimization choices in RTL design. Jiaze Li from Qualcomm presented a paper at a recent ANSYS Simulation World on their more realistic approach.

Gaming Benchmarks

First Qualcomm start with realistic gaming loads. Jiaze mentioned Manhattan and Aztec Ruins as two popular games used for GPU benchmarking today. They extract multi-millisecond sequences from these games as their basis for testing. These are still long enough that simulation must run on an emulator. ANSYS PowerArtist uses an activity streaming interface with Mentor Graphics’ Veloce emulator to enable the efficient transfer of long activity patterns. Qualcomm uses this flow to drive power analysis with PowerArtist. They can also track how power is changing as the design evolves and to optimize RTL for power reduction..

Jiaze added that the emulation flow is too cumbersome for detailed power debug. Instead they use a parallel simulation-based power flow. The tests they use here are derived from the same large gaming benchmarks. However, they greatly reduce size to capture the essentials of graphics features which can still run in reasonable time on the simulator. This reduction is very much a manual task, something into which Jiaze and the team put a lot of work, but they’ve figured out a process to efficiently build these reduced tests.

Windowed Analysis

The second important point is that they divide the analysis time, by graphics features, into multiple windows. The systems team defines the windows, which are not generally equal in size. PowerArtist then calculates power-estimates per window. This gives them a chunked timeline view of averages, in which they can see variations in average power as a function of feature. That he says gives them a lot of insight into contributors to power in any given window. Which also suggests how they might best optimize not only for average power but also for some sense of peak power.

Jiaze said that the flow is running in bi-weekly production regressions at Qualcomm. They have used the flow to drive a 5% reduction in power on their most recent design. Most of the improvements were through adding clock gating and eliminating redundant data toggling. He added a very nice bonus in their use of this method. They are able to very concretely justify the power reductions they are able to find. Much better than a more general ‘we suggested a bunch of improvements and see – it got better!’

If you want to hear the talk, click HERE to go to the ANSYS Simulation World recorded event. This talk is the sixth under “Semiconductors”. You can also learn more about PowerArtist HERE.

Also Read

The Largest Engineering Simulation Virtual Event in the World!

Prevent and Eliminate IR Drop and Power Integrity Issues Using RedHawk Analysis Fusion

Reliability Challenges in Advanced Packages and Boards

June 29, 2020July 2, 2022

Interview with Altair CTO Sam Mahalingam

Interview with Altair CTO Sam Mahalingam
by Daniel Nenni on 06-29-2020 at 10:00 am
Categories: Altair, CEO Interviews, EDA

In this interview we talk with Sam Mahalingam, chief technology officer at Altair, about gaining a competitive edge with software that’s built to handle high-throughput workloads like chip design and electronic design automation (EDA). Altair is a global technology company providing solutions in product development, high-performance computing (HPC), and data analytics.

Where does Altair fit into the semiconductor industry?
A competitive edge in this industry can come down to a very slim margin: seconds or even milliseconds. The goal is to enable users to iterate more designs in less time, and ultimately reach market first with a superior product. Our software is designed to make that possible. When you’re not getting as much value as you should from your infrastructure, not optimizing where it’s possible, the long-term cost of missed opportunities can be high.

A big challenge is that every step of design exploration and verification involves a complex set of variables, each requiring time to analyze both individually and in terms of its interaction with other variables. It’s not unusual for engineering teams to run millions of jobs each day, so the ability to achieve maximum throughput means a team can test as many variables as possible and be less likely to miss a crucial interaction that will affect the final product.

Altair software designed expressly for high-throughput workloads in areas like semiconductor design includes the Altair Accelerator™ enterprise job scheduler, Altair Allocator™ multi-site license allocation and management tool, and Altair Accelerator Plus hyper-scheduler.

How does hyper-scheduling boost efficiency?
Hyper-schedulers, or hierarchical schedulers, like Altair Accelerator Plus are built to offload the base scheduler for greater throughput, better utilization, and flexible usage models. Millisecond dispatch latency is important for short jobs, which users can queue sequentially on their own. When a batch of jobs is presented as one larger job, it reduces the burden on the lower-level scheduler while maintaining visibility into each individual job.

A hierarchical scheduler can also handle user query, job submission, and reporting functions, which is a substantial offload from the base scheduler since 80% of typical scheduler loads come from these non-dispatch functions.

What’s your take on cloud technology for chip design and EDA?
Today’s cloud technology has broken down barriers like cost, latency, and security concerns. The cloud is scalable and elastic, and it’s an attractive option for chip designers and EDA engineers. Major cloud providers including Google Cloud, Amazon Web Services, Oracle Cloud Interface, and Microsoft Azure make it possible for businesses of any size to access powerful resources without the need to own and maintain their own data centers.

Scheduling technology like Altair Accelerator enables users to optimize performance, match cloud expenditure to actual compute demand, and let teams shift seamlessly between cloud and on-premises environments with flexible, demand-based license allocation tools. When demand stops, on-demand cloud resources scale right back to zero.

Accelerator is storage-aware, meaning that it modulates running jobs based on filer latency. Accounting for latency experienced by the filer can dramatically accelerate scheduling speed in the cloud. Altair customers have seen up to 10 times increased acceleration with storage-aware scheduling. We also have a useful tool we call Rapid Scaling that can be used for cost optimization.

Tell us more about Rapid Scaling.
Rapid Scaling is part of the Accelerator package, an optimization tool we designed to be minimalist, configurable, and transparent. It allows users to auto-scale resources in the cloud, helping to bring the cost of cloud resources as close as possible to exact demand. Rapid Scaling looks at workload speed and determines which computing resources are critical. Users can scale based on workload speed and contain everything in a single instance, clearly showing the cost of computing.

With Rapid Scaling users can measure workload movement and respond quickly, rapidly terminate instances, and know exactly how much they’re spending.

How does licensing impact workload optimization?
EDA licenses are expensive, and large companies routinely spend millions of dollars on them every year. With millions of jobs running daily, it’s easy for users to get bogged down in peak-time resource queues. Not having enough licenses means lost productivity and slower progress, but an excess of licenses is a waste of money. With the right scheduling software, engineers get access to just enough licenses to get their work done without needing to absorb the cost of overprovisioning for peak demand times.

We optimize license utilization between on-premises and cloud infrastructure with Altair Allocator, based on demand at each location. Licenses are allocated between on-premises and cloud locations. This makes for efficient collaboration, even in hybrid environments. Licenses are simply moved to where the workload is.

You can learn more in this webinar, Saving Serious Money with License-first Scheduling.

About Altair (Nasdaq: ALTR)
Altair is a global technology company that provides software and cloud solutions in the areas of product development, high performance computing (HPC) and data analytics. Altair enables organizations across broad industry segments to compete more effectively in a connected world while creating a more sustainable future. To learn more, please visit www.altair.com.

Also Read:

CEO Interview: John O’Donnel of yieldHUB

CEO Interview: Deepak Kumar Tala of SmartDV

Fractal CEO Update 2020

June 29, 2020August 12, 2020

Optimizing Chiplet-to-Chiplet Communications

Optimizing Chiplet-to-Chiplet Communications
by Tom Dillinger on 06-29-2020 at 6:00 am
Categories: Events, Semiconductor, TSMC

Summary
The growing significance of ultra-short reach (USR) interfaces on 2.5D packaging technology has led to a variety of electrical definitions and circuit implementations. TSMC recently presented the approach adopted by their IP development team, for a parallel-bus, clock-forwarded USR interface to optimize power/performance/area – i.e., “LIPINCON”.

Introduction
The recent advances in heterogeneous, multi-die 2.5D packaging technology have resulted in a new class of interfaces – i.e., ultra-short reach (USR) – whose electrical characteristics differ greatly from traditional printed circuit board traces. Whereas the serial communications lane of SerDes IP is required for long, lossy connections, the short-reach interfaces support a parallel bus architecture.

The SerDes signal requires (50 ohm) termination to minimize reflections and reduce far-end crosstalk, adding to the power dissipation. The electrically-short interfaces within the 2.5D package do not require termination. Rather than “recovering” the clock embedded within the serial data stream, with the associated clock-data recovery (CDR) circuit area and power, these parallel interfaces can use a simpler “clock-forwarded” circuit design – a transmitted clock signal is provided with a group of N data signals.

Another advantage of this interface is that the circuit design requirements for electrostatic discharge protection (ESD) between die are much reduced. Internal package connections will have lower ESD voltage stress constraints, saving considerable I/O circuit area (and significantly reducing I/O parasitics).

The unique interface design requirements between die in a 2.5D package has led to the use of the term “chiplet”, as the full-chip design overhead of SerDes links is not required. Yet, to date, there have been quite varied circuit and physical implementation approaches used for these USR interfaces.

TSMC’s LIPINCON interface definition
At an invited talk for the recent VLSI 2020 Symposium, TSMC presented their proposal for a parallel-bus, clock-forwarded architecture – “LIPINCON” – which is short for “low-voltage, in-package interconnect”. [1] This article briefly reviews the highlights of that presentation.

The key parameters of the short-reach interface design are:

Data rate per pin: dependent upon trace length/insertion loss, power dissipation, required circuit timing margins
Bus width: with modularity to define sub-channels
Energy efficiency: measured in pJ/bit, including not only the I/O driver/receiver circuits, but any additional data pre-fetch/queuing and/or encoding/decoding logic
“Beachfront” (linear) and area efficiencies: measure of the aggregate data bandwidth per linear edge and area perimeter on the chiplets – i.e., Tbps/mm and Tbps/mm**2; dependent upon the signal bump pitch, and the number and pitch of the metal redistribution layers on the 2.5D substrate, which defines the number of bump rows for which signal traces can be routed – see the figures below
Latency: another performance metric; the time between the initiation of data transmit and receive, measured in “unit intervals” of the transmit cycle

Architects are seeking to maximize the aggregate data bandwidth (bus width * data rate), while achieving very low dissipated energy per bit. These key design measures apply whether the chiplet interface is between multiple processors (or SoCs), processor-to-memory, or processor-to-I/O controller functionality.

The physical signal implementation will differ, depending on the packaging technology. The signal redistribution layers (RDL) for a 2.5D package with silicon interposer will leverage the finer metal pitch available (e.g., TSMC’s CoWoS). For a multi-die package utilizing the reconstituted wafer substrate to embed the die, the RDL layers are much thicker, with a wider pitch (e.g., TSMC’s InFO). The figures below illustrate the typical signal trace shielding (and lack of shielding) associated with CoWoS and InFO designs, and the corresponding signal insertion and far-end crosstalk loss.

The key characteristics of the TSMC LIPINCON IP definition are illustrated schematically in the figure below.

A low signal swing interface of 0.3V is adopted (also saves power).
The data receiver uses a simple differential circuit, with a reference input to set the switching threshold (e.g., 150mV).
A clock/strobe signal is forwarded with (a sub-channel of) data signals; the receiver utilizes a simple delay-locked loop (DLL) to “lock” to this clock.

Briefly, a DLL is a unique circuit – it consists of an (even-numbered) chain of identical delay cells. The figure below illustrates an example of the delay chain. [2] The switching delay of each stage is dynamically adjusted by modulating the voltage inputs to the series nFET and pFET devices in the input inverter of each stage – i.e., a “current-starved” inverter. (Other delay chain implementations dynamically modify the identical capacitive load at each stage output, rather than adjusting the internal transistor drive strength of each stage.)

The “loop” in the DLL is formed by a phase detector (XOR-type logic with low-pass filter), which compares the input clock to the final output of the chain. The leading or lagging nature of the input clock relative to the chain output adjusts the inverter control voltages – thus, the overall delay of the chain is “locked” to the input clock. The (equal) delays of each stage in the DLL chain provides outputs that correspond to a specific phase of the input clock signal. The parallel data is captured in receiver flops using an appropriate phase output, a means of compensating for any data-to-clock skew across the interface.

The TSMC IP team developed an innovative approach for the specific case of a SoC-to-memory interface. The memory chiplet may not necessarily embed a DLL to capture signal inputs. For a very wide interface – e.g., 512 addresses, 256 data bits, divided into sub-channels – the overhead of the DLL circuitry in the cost-sensitive memory chiplet would be high. As illustrated in the figure below, the DLL phase output which serves as the input strobe for a memory write cycle is present in the SoC instead. (The memory read path is also illustrated in the figure, illustrating how the data strobe from the memory is connected to the read_DLL circuit input.)

For the parallel LIPINCON interface, simultaneous switch noise (SSN) related to signal crosstalk is a concern. For the shielded (CoWoS) and unshielded (InFO) RDL signal connections illustrated above, TSMC presented results illustrating very manageable crosstalk for this low-swing signaling.

To be sure, designers would have the option of developing a logical interface between chiplets that used data encoding to minimize signal transition activity in successive cycles. The simplest method would be to add data bus inversion (DBI) coding – the data in the next cycle could be compared to the current data, and transmitted using true or inverted values to minimize the switching activity. An additional DBI signal between chiplets carries this decision for the receiver to decode the values.

The development of heterogeneous 2.5D packaging relies upon the integration of known good die/chiplets (KGD). Nevertheless, the post-assembly yield of the final package can be enhanced by the addition of redundant lanes which can be selected after package test (ideally, built-in self-test). The TSMC presentation included examples of redundant lane topologies which could be incorporated into the chiplet designs. The figure below illustrates a couple of architectures for inserting redundant through-silicon-vias (TSVs) into the interconnections. This would be a package yield versus circuit overhead tradeoff when architecting the interface between chiplets.

In a SerDes-based design, thorough circuit and PCB interconnect extraction plus simulation is used to analyze the signal losses. The variations in signal jitter and magnitude are analyzed against the receiver sense amp voltage differential. Hardware lab-based probing is also undertaken to ensure a suitable “eye opening” for data capture at the receiver. TSMC highlighted that this type of interface validation is not feasible with the 2.5D package technology. As illustrated below, a novel method was developed by their IP team to introduce variation into the LIPINCON transmit driver and receive capture circuitry to create an equivalent eye diagram for hardware validation.

The TSMC presentation mentioned that some of their customers have developed their own IP implementations for USR interface design. One example showed a very low swing (0.2V) electrical definition that is “ground referenced” (e.g., signal swings above and below ground). Yet, for fabless customers seeking to leverage advanced packaging, without the design resources to “roll their own” chiplet interface circuitry, the TSMC LIPINCON IP definition is an extremely attractive alternative. And, frankly, given the momentum that TSMC is able to provide, this definition will likely help accelerate a “standard” electrical definition among developers seeking to capture IP and chiplet design market opportunities.

For more information on TSMC’s LIPINCON definition, please follow this link.

-chipguy

References

[1] Hsieh, Kenny C.H., “Chiplet-to-Chiplet Communication Circuits for 2.5D/3D Integration Technologies”, VLSI 2020 Symposium, Paper SC2.6 (invited short course).

[2] Jovanovic, G., et al., “Delay Locked Loop with Linear Delay Element”, International Conference on Telecommunication, 2005, https://ieeexplore.ieee.org/document/1572136

Images supplied by the VLSI Symposium on Technology & Circuits 2020.

June 28, 2020August 12, 2020

Intel Designs Chips to Protect from ROP Attacks

Intel Designs Chips to Protect from ROP Attacks
by Matthew Rosenquist on 06-28-2020 at 10:00 am
Categories: Intel Foundry, Security

Intel comes late to the game but will be delivering an embedded defense for Return Oriented Programming (ROP) types of cyber hacks. I first blogged about this back in Sept of 2016. Yes, almost four years have passed and I had hoped it would see the light of day much earlier.

The feature, to debut in the Tiger Lake microarchitecture in 2021 according to Intel, will be marketed as a Control-Flow Enforcement Technology (CET) that is designed to disrupt a class of exploits that seek to leverage bits of code that are already trusted. These ROP attacks use chunks of code from other software and hobble them together to create a malicious outcome. In the hacking world, it is similar to Frankenstein’s monster, where something grotesque is assembled from various innocent parts. ROP hacking techniques are great at evading detection and therefore a favorite among the higher classes of skilled threat actors.

Embedding the CET feature into the hardware and firmware provides a few advantages over trying to mitigate these attacks solely at the operating system level. First, there is the performance factor. Code that is specifically optimized by hardware moves significantly faster than traditional software components, so this should have a much less impact on system performance. Secondly, depending upon how it is configured to run, the hardware can add additional protection features to reduce the chances it can be disabled, modified, or compromised by adversaries.

Unfortunately, that is not the whole picture, as there are potential drawbacks for embedding such designs lower in the system stack. Namely, if there is a vulnerability in the code, it could be very difficult to patch or correct. Let’s face it, Intel’s reputation is not the greatest as of late when it comes to dealing with vulnerabilities in their products.

Overall, I am excited at the prospect of disrupting ROP types of attacks. I fully expect the best and brightest hackers will work to find ways around the protections, but that takes time and resources. This is how the game is played. It is great when new technology takes the initiative to force the attackers to adapt. The value for CET greatly depends on OS vendors’ adoption, if it has the right balance of features that are hardened, and if it runs with such efficiency that it does not overly burden system performance. Expects tests and reviews after Tiger Lake comes to market, to determine if it is simply a superficial marketing tactic or if CET represents a robust capability to mitigate hacking risks.

Interested in more? Follow me on LinkedIn, Medium, and Twitter (@Matt_Rosenquist) to hear insights, rants, and what is going on in cybersecurity.

June 28, 2020November 30, 2022

The Stochastic Impact of Defocus in EUV Lithography

The Stochastic Impact of Defocus in EUV Lithography
by Fred Chen on 06-28-2020 at 6:00 am
Categories: Lithography
3 Comments

The stochastic nature of imaging has received a great deal of attention in the area of EUV lithography. The density of EUV photons reaching the wafer is low enough [1] that the natural variation in the number of photons arriving at a given location can give rise to a relatively large standard deviation.

In recent studies [2,3], it was shown that large 2D complex patterns with a large diffraction spectrum can divide a large number of photons into smaller groups, each representing a different interference pattern. Each group therefore has a relatively more significant shot noise. However, the effect of defocus had not yet been considered.

In this article, it will be shown that even for a single photon group for a basic 2-beam interference pattern, when a large number of source points are used, the effect of defocus is to, once again, divide the total number of photons into smaller groups, each representing a different degree of defocus, as determined by the phase difference between the two interfering beams, referred to as the 0th and 1st diffraction orders. This, in turn, causes a more rapid degradation of the image.

Separation of Source Points by Defocus

Figure 1 shows all the possible source points that can contribute to imaging a 40 nm line pitch, under the condition of 60 nm defocus. The source point coordinates are the sines of the angles with respect to the optical axis. At the nominal EUV wavelength of 13.5 nm and numerical aperture of 0.33, the 40 nm pitch can only be imaged as a 2-beam interference. Moreover, some source points (not shown) cannot provide an interference pattern, only background light.

Figure 1. Source points for two-beam interference at 40 nm pitch, classified by the phase difference between the interfering beams at 60 nm defocus (wavelength = 13.5 nm). The phase difference from 60 nm defocus is calculated by 360 deg/13.5 nm * 60 nm * [cos(0th order angle) – cos(1st order angle)].

Figure 1 shows only those points producing 2-beam interference, categorized according to phase difference between the interfering 0th and 1st orders. From the image differences among the groups in Figure 2, we can roughly divide the photons by defocus into 0-30 deg, 30-50 deg, 50-70 deg, 70-90 deg, in both positive and negative directions, leading to eight groups total, with the photons roughly uniformly distributed among them.

Figure 2. Effect of defocus phase difference between 0th and 1st orders on the image.

With significantly fewer photons per phase difference range, the stochastic impact is aggravated. The degree of defocus of the wafer image becomes effectively determined by the variable number of photons per phase difference range.

Phase defect sensitivity

EUV masks are also subject to phase defects, which can be manifest as sub-nm height bumps [4]. These phase defects change the vertical location of best focus and introduce small CD errors. The stochastic impact will manifest itself as defocus variation, i.e., how far the wafer location is from best focus, as well as CD variation (see Figure 3).

Figure 3. A phase defect combined with defocus lead to a more severe CD error. This is for the two-beam interference case as in Figure 1. A 30 degree defocus-induced phase difference between 0th and 1st orders is assumed. A 20 deg 10 nm wide phase line defect in a nominal 20 nm wide exposed line region (40 nm pitch) is also assumed.

Remaining Concerns for Using Low Pupil Fill

The impact of wide defocus range provides yet another argument for low pupil fill [5]. A lower pupil fill obviously reduces the defocus range, resulting in reducing the phase difference range. There is still the remaining concerns of throughput from light being excluded [6] and ring field illumination rotation [7].

The ring field illumination concern is reviewed in Figure 4. Ideally, the plane of incidence is fixed across a rectangular exposure field (slit). However, the off-axis focus is not to a line but a point. Consequently, the field is arc-shaped, and the plane of incidence is rotated across the field, with the line of sight to the point source as the axis of symmetry. This means the distribution of source points is also rotated across slit, not maintaining their ideal position.

Figure 4. Plane of incidence must rotate for focusing to an off-axis point point, in a reflective optical system.

While DUV wavelengths, e.g., ArF (193 nm) immersion, also quickly migrated to low pupil fill (due to very low k1) for better defocus performance, those optical systems were transmissive, not reflective, obviating the need for off-axis focusing and ring fields.

Therefore, a workaround that can be used with EUV systems for now would be using only a small portion of the field to limit the degree of rotation [8]. A smaller field, however, means more exposure stops per wafer, so throughput again will suffer.

References

[1] https://www.euvlitho.com/2009%20Workshop/Oral%2045%20Resist-8%20Mack.pdf

[2] https://www.linkedin.com/pulse/stochastic-considerations-multi-point-source-lithography-chen

[3] https://www.linkedin.com/pulse/stochastic-variation-euv-source-illumination-frederick-chen/

[4] T. Terasawa, T. Yamane, Y. Arisawa, H. Watanabe, “Phase defect printability analyses: dependence of defect type and EUV exposure condition,” Proc. SPIE 8322, 83221R (2012).

[5] https://www.linkedin.com/pulse/need-low-pupil-fill-euv-lithography-frederick-chen

[6] M. van de Kerkhof, H. Jasper, L. Levasier, R. Peeters, R van Es, J-W. Bosker, A. Zdravkov, E. Lenderink, F. Evangelista, P. Broman, B. Bilski, T. Last, “Enabling sub-10nm node lithography: presenting the NXE:3400B EUV scanner,” Proc. SPIE 10143, 101430D (2017).

[7] S-S. Yu, A. Yen, S-H. Chang, C-T. Shih, Y-C. Lu, J. Hu, T. Wu, “On the Extensibility of Extreme-UV Lithography,” Proc. SPIE 7969, 79693A (2011).

[8] https://www.linkedin.com/pulse/forbidden-pitch-combination-advanced-lithography-nodes-frederick-chen/

This article originally appeared in LinkedIn Pulse: The Stochastic Impact of Defocus in EUV Lithography

CEO Interview: John O’Donnel of yieldHUB

CEO Interview: John O’Donnel of yieldHUB
by Daniel Nenni on 06-26-2020 at 10:00 am
Categories: CEO Interviews, EDA, yieldHUB

Let me introduce John O’Donnell, CEO of yieldHUB. After earning a degree in microelectronics John spent 18 years at Analog Devices before founding yieldHUB in 2005. If anybody knows yield it is Analog Devices having shipped billions upon billions of chips, absolutely.

SemiWiki will be digging deeper into the technology behind yieldHUB but first let’s talk to John.

What is yieldHUB?
yieldHUB is a leading semiconductor yield management provider. We work with Fabless and IDM companies worldwide. Founded in 2005, we’re celebrating 15 years in business this year.

What gap did you see in the market?
I saw a gap in the market in 2005 for web-based YMS (yield management software), where there should be no need to download data before being able to chart it and analyze it. Let the server do the work! We wanted to remove the hassle from engineers of always having to assemble disparate data for hours before ever getting to analyze and report on an engineering problem.

Why do you do what you do?
We want engineers to spend less time gathering data and more time solving problems. We give oversight to their managers, as they can see the data and reports their teams are working on.

We help companies increase their yield and reduce scrap to improve their margins. Our STDF analysis is very sophisticated and allows engineers and their managers to create excellent reports and drill down into what’s happening on the factory floor. One of our customers said that yieldHUB makes engineers 10 times more efficient!

What challenges did you have?
Early on in the journey, we pivoted to Real-Time analysis of the test floor. We knew how to do it without adding any hardware. But when we produced it, people weren’t willing to pay for it – we were probably ten years ahead of our time in that area. However, companies were willing to pay for a relational database and associated tools for historical analysis if they were fast and comprehensive enough. So we went back to fully concentrating on our original plan and were able to continue growing and developing.

What makes yieldHUB successful?
Having a powerful enduring vision of making powerful data analysis easy and speedy for engineers – and then hiring great people who believe in the vision and bring their own knowledge and experience to it! Most of our employees work remotely, this allows us to hire top talent around the world. Our team members and associates are based in Ireland, the USA, the UK, The Netherlands, The Philippines, Taiwan, South Korea and Japan. It allows us to serve our customers in their time zones.

We hire experienced technical people who, importantly, also have empathy for customers. Our salespeople are all former Test, Product and Foundry Engineers. They listen and give great advice because they’ve been there.

What industries do you work with?
Our customers are diverse. Several of our customers are start-ups that make chips for 5G. Others provide automotive chips for tier 1 car manufacturers. Some work in the aviation and military sectors. One of our largest customers works in consumer goods. We provide different services for different needs. With consumer goods the margins are tight – they have huge volumes and need very high yields. For automotive companies, quality, reliability and traceability are key. We enjoy working with our customers to identify their needs, then show how we can help them. Being at the forefront of new technology is exciting. Every day we’re helping to create the future.

What’s next for yieldHUB?
The fast-growing image sensor market is a big opportunity. We have developed exciting cloud-based image analysis software which is now in production and allows automated categorisation of images and defects.

Our automated data-cleansing capabilities lend themselves well to provide clean, linked data as inputs to machine learning, reducing errors in the predictions. Hence machine learning for yield improvement is also a growing focus for yieldHUB.

With our new API, you can now access clean manufacturing data from yieldHUB in real-time from other systems including from MES/ERP and financial systems. This also is looking very promising to help customers too, for example, reconcile invoices from their subcons with actual test time, volume and yield information from the hugely scalable and reliable yieldHUB database.

About yieldHUB
yieldHUB supplies world-class data analysis solutions to the semiconductor industry. For companies who design and manufacture semiconductors, the cloud-based software provides a complete understanding of their product performance and yields. Visit yieldhub.com

Also Read:

CEO Interview: Deepak Kumar Tala of SmartDV

Fractal CEO Update 2020

CEO Interview: Johnny Shen of Alchip