BannerforSemiWiki 800x100 (2)

Unleash the Power: NVIDIA GPUs, Ansys Simulation

Unleash the Power: NVIDIA GPUs, Ansys Simulation
by Daniel Nenni on 03-19-2024 at 10:00 am

Electromagnatic PerceiveEM
ANSYS Perceive EM GPU solver for 5G/6G simulations integrated into NVIDIA Omniverse

In the realm of engineering simulations, the demand for faster, more accurate solutions to complex multiphysics challenges is ever-growing.

Simulation is a vital tool for engineers to design, test, and optimize complex systems and products. It helps engineers reduce costs, improve quality, and accelerate innovation. However, the associated high computational demands, large data sets, and multiple physics domains pose significant challenges.

A key technology instrumental in meeting this demand is the graphics processing unit (GPU). With its unique applicability to acceleration of multiphysics simulations, GPUs are a game-changer driving innovation. Ansys and NVIDIA partner to deliver solutions that leverage the power of NVIDIA GPUs and the physics-based authority of Ansys software. Together, Ansys and NVIDIA are enabling engineers to solve the industry’s most computationally challenging problems.

NVIDIA GPUs for Multiphysics Simulations

NVIDIA GPUs are designed to accelerate parallel and compute-intensive tasks, such as simulations, leveraging thousands of cores and high-bandwidth memory. NVIDIA GPUs deliver magnitudes faster performance than CPUs for many simulation applications, enabling engineers to run more simulations in less time and with higher fidelity.

NVIDIA GPUs, now famous for driving the AI revolution, are renowned for their high-performance computing capabilities, making them an ideal choice for engineers and scientists working on multiphysics simulations in highly complex applications in industries such as aerospace, automotive, biomedical, and energy. These simulations account for the interaction of multiple physical phenomena, such as fluid dynamics, structural mechanics, and electromagnetics. By harnessing the parallel processing power of NVIDIA GPUs, engineers can significantly reduce simulation times and achieve more accurate results. More simulation iterations lead to more ideas and ultimately superior end products.

The innovation of 3D-IC designs requires the addition of multiphysics simulations to semiconductor design, such as electromagnetic, thermal, and structural analysis. Stacking of chiplets in close proximity within a single package brings system-level multiphysics challenges into IC design.

Solving for Multiphysics Challenges

Ansys has long been at the forefront of providing cutting-edge simulation software solutions for engineers and researchers worldwide. With a strong focus on multiphysics and semiconductor simulations, Ansys has established itself as the leader in enabling users to simulate a wide range of physical phenomena accurately. Ansys software is trusted by tens of thousands of engineers worldwide who rely on its accuracy, reliability, and scalability.

Ansys software is particularly renowned for its multiphysics simulations enabling engineers to simulate the complex interactions among various physical processes and gain valuable insights into the behavior of their systems. Ansys offers a comprehensive suite of tools for multiphysics simulations, such as Ansys Discovery, Ansys Fluent™, Ansys HFSS™, Ansys LS-DYNA™, and Ansys SPEOS™. These tools enable engineers to perform interactive, real-time, and high-fidelity simulations of various multiphysics phenomena, such as fluid-structure interaction, electromagnetic, shock and impact, and optical performance.

ANSYS LS-Dyna crash test simulation result visualized with NVIDIA Omniverse
Ansys Product Support for NVIDIA Processors: Leveraging Grace and Hopper

Ansys harnesses NVIDIA H100 GPUs to boost multiple simulation solutions and prioritizes NVIDIA’s latest processors Grace and Hopper along with the newly announced Blackwell architecture for products across the Ansys portfolio, such as Fluent, and LS-DYNA. By utilizing these products in conjunction with NVIDIA processors, engineers achieve faster simulation times, increased accuracy, and improved productivity in their work.

Ansys software can leverage the features and benefits of NVIDIA processors, such as:

  • Massive parallelism and high-bandwidth memory enable faster and more accurate simulations leading to better end products.
  • Unified memory and NVLink enable seamless data transfer and communication between CPU and GPU.
  • Tensor cores and ray tracing cores enable advanced simulations of artificial intelligence and optical effects.
  • Multi-GPU and multi-node support enable scalable simulations of large and complex models.
Gearbox CFD simulation using ANSYS Fluent
Driving Innovation: Benchmarks and Performance Gains

Ansys integrates support for NVIDIA processors into its flagship products to harness the immense potential of NVIDIA processors in enhancing simulation performance. This collaboration between Ansys and NVIDIA unlocks new possibilities for engineers seeking to leverage the power of GPU acceleration in their simulations. Ansys has already announced intent to support NVIDIA’s just announced innovative Blackwell architecture, presaging even more magnitudes of simulation acceleration

ANSYS Perceive EM GPU solver for 5G/6G simulations integrated into NVIDIA Omniverse
  • ANSYS Perceive EM GPU solver for 5G/6G simulations integrated into NVIDIA Omniverse

Compared to traditional computing methods, benchmarks demonstrate that Ansys simulations run on NVIDIA GPUs deliver significant performance gains. Engineers see a substantial reduction in simulation times, allowing for faster design iterations and more efficient problem-solving. For example:

  • Fluent enables high-fidelity and scalable fluid dynamics simulations on NVIDIA GPUs, allowing engineers to solve challenging problems such as turbulence and combustion phenomena. Fluent runs up to 5x faster on one NVIDIA H100 GPU than on dual 64 cores of a recently released high-end CPU processor.
  • Ansys Mechanical™ enables fast and accurate structural mechanics simulations on NVIDIA GPUs, allowing engineers to model complex phenomena such as acoustics, vibration, and fracture dynamics. Mechanical’s matrix kernel in Mechanical runs on 4 CPU cores up to 11x faster when adding one NVIDIA H100 GPU.
  • Ansys SPEOS enables realistic and high-performance optical simulations on NVIDIA GPUs, allowing engineers to design, measure and assess light propagation in any environment. SPEOS can run optical simulations up to 35x faster on an NVIDIA RTX™ 6000 Ada than on a recently released  8-core processor.
  • Ansys Lumerical enables comprehensive and efficient photonics simulations, allowing engineers to design and optimize photonic devices and circuits. Ansys Lumerical FDTD running on a single NVIDIA A100 GPU solves up to 40% faster than when compared to an HPC cluster containing 480 AMD EPYC 7V12 cores of a recently released high-end CPU. This equates to nearly a 6x improvement in price-performance ratio.
  • Other Ansys products such as RedHawk-SC™, Discovery, Ensight™, Rocky™, HFSS SBR+™, Perceive EM™, Maxwell™, AVxcelerate Sensors™, and RF Channel Modeler™ either already or soon will benefit from NVIDIA GPU acceleration.
Caption: ANSYS AVxcelerate Sensors (Physics-based Radar, Lidar, Camera) connected to NVIDIA Drive Sim

Embracing the Future: Grace, Hopper and now Blackwell

The future of simulation is bright, and NVIDIA’s latest innovations including NVIDIA Grace CPU, NVIDIA Hopper GPU, and now Blackwell architectures promise even more impressive performance gains. Ansys is committed to optimizing its software for these next-generation platforms, ensuring engineers have access to the most powerful simulation tools available.

The combination of Ansys’ advanced simulation software and NVIDIA’s GPU technology is revolutionizing engineers’ approach to multiphysics simulations. The recently announced expanded partnership between the two companies promises even greater advances, enhanced by artificial intelligence. Using Ansys software with NVIDIA GPUs, engineers tackle complex multiphysics simulations with unprecedented speed and accuracy, paving the way for new innovations and breakthroughs in engineering and science.

Ansys and NVIDIA Pioneer Next Era of Computer-Aided Engineering  

*CoPilot & ChatGPT, both powered by NVIDIA GPUs, contributed to this blog post.

Also Read:

Ansys and Intel Foundry Direct 2024: A Quantum Leap in Innovation

Why Did Synopsys Really Acquire Ansys?

Will the Package Kill my High-Frequency Chip Design?


Synopsys Enhances PPA with Backside Routing

Synopsys Enhances PPA with Backside Routing
by Mike Gianfagna on 03-19-2024 at 6:00 am

Comparison of frontside and backside PDNs (Source IMEC)

Complexity and density conspire to make power delivery very difficult for advanced SoCs. Signal integrity, power integrity, reliability and heat can seem to present unsolvable problems when it comes to efficient power management. There is just not enough room to get it all done with the routing layers available on the top side of the chip. A strategy is emerging to deal with the problem that seems to take a page out of the multi-die playbook. Rather than deal with the existing, single surface constraints, why not move power delivery to the backside of the chip, and get additional PPA benefit out of it? The entire fab and process equipment ecosystem is buzzing about this approach. But what about the design methodology? There is help on the way. A very informative white paper is now available from the leading EDA supplier. Read on to get the details about how Synopsys enhances PPA with backside routing.

Why Use BackSide Routing?

In a typical SOC, dedicated power layers tend to be thicker, with wider traces than the signal layers to reduce the amount of loss due to IR drop. The power delivery network, or PDN, is what brings power to all parts of the chip. PDN design required extensive analysis of electromigration, noise, and cross-coupling effects as well as IR drop to ensure power integrity. Solving this problem by adding metal layers will increases the cost and complexity of the fabrication process, if it’s even possible given process constraints.

There is more to the story which is explained well in the white paper (a link is coming). Now that manufacturing technology supports it, backside routing for the PDN is a great way to remove the obstacles, opening a new approach to implementation and the opportunity to enhance PPA. The graphic at the top of this post provides a comparison of frontside and backside PDNs. Thanks to IMEC for this depiction.

But, as they say, there’s no free lunch. For backside routing, the design process has to deal with many new problems, such as:

  • Signal integrity
    • Frontside PDN acted as a natural shield for signal integrity
    • Important to have close correlation between pre-route and post-route
  • Thermal impact
    • Thermal aware implementation to reduce impact of backside metal
  • Post-silicon observability
    • Methodologies to support robust observability
  • Multi-die and backside metal
    • Lots of synergy between the two
    • Leverage EDA technology pieces from each other

The white paper also explains what Synopsys is doing for backside routing. Here is a summary.

What Synopsys is Doing

Synopsys has embraced the use of backside routing for the PDN. The approach fits well with its design technology co-optimization (DTCO) methodology. The company has added support for backside PDNs in all relevant EDA products. The result is fast and efficient technology exploration, design PPA assessment, and design closure to accelerate the overall development process. As with many of its programs, the approach allows chip designers to adopt new silicon technology with predictable results.

A large number of additions are part of Synopsys Fusion Compiler, the industry-leading RTL-to-GDSII implementation system. The figure below summarizes the enhancement at a high level. The white paper goes into more detail about these enhancements and the measurable impact on chip design results.

Overview of Synopsys Fusion Compiler Enhancements

The white paper also discusses potential future additions to expand the use of backside routing even further.

To Learn More

Backside routing is here. The foundry ecosystem is delivering this capability and design teams need an enhanced flow to take advantage of the benefits, both today and tomorrow. Synopsys is at the leading edge of this trend and the new white paper provides important details. You can get a copy of the new Synopsys white paper here. And that’s how Synopsys enhances PPA with backside routing.


Afraid of mesh-based clock topologies? You should be

Afraid of mesh-based clock topologies? You should be
by Daniel Payne on 03-18-2024 at 10:00 am

mesh-based clock topology

Digital logic chips synchronize all logic operations by using a clock signal connected to flip-flops or latches, and the clock is distributed across the entire chip. The ultimate goal is to have a clock signal that arrives at the exact same moment in time at all clocked elements. If the clock arrives too early or too late from the PLL output to flip-flop or latch across the chip, then that time difference will impact the critical path delays and the maximum achievable clock frequency. An architect or RTL designer looks at the clock as being a perfectly defined square wave with no delays, while engineers doing timing analysis or physical design know that clock signals are starting to look more like Sine waves than square waves, and that there are delays along the clock tree depending on the topology of the clock network. At small process nodes the On-Chip Variations (OCV) make delays in logic and clock networks differ from ideal conditions, so clock designers resort to adding timing margins.

Two popular clock topologies are tree and mesh, so a comparison reveals the differences between each.

Tree and Mesh Clock topologies. Source: GLVLSI 10, May 16-18, 2010

Tree Mesh
Shared path depth Higher Lower
Timing Analysis Static Timing Analysis SPICE
Power Lower Higher
OCV More sensitive More tolerant
Clock speed Lower Higher
Clock skew Higher Lower
Routing resources Lower Higher

With a tree topology the EDA tool flow is quite automated with Clock Tree Synthesis (CTS) in popular tool flows that include logic synthesis tied to place and route, with timing analysis run on a Static Timing Analysis (STA) tool. The downsides to a tree topology are the sensitivity to OCV, lower clock speeds and higher clock skew. In modern process nodes the aging of P and N channel devices will change the duty cycle of the clock, so it may not be 50% high and 50% low, which then impacts critical path delays.

The mesh topology for clocks provides the highest clock speeds, higher tolerance to OCV and lowest clock skew. A mesh topology has downsides of higher routing resources, higher power consumption and the SPICE requirement for timing analysis. An STA tool cannot be used to analyze a mesh, because with a mesh the clock signal has paths that combine. The only information that a STA tool could provide in a mesh-based clock design is setup and hold analysis, but not critical path analysis.

There is also a middle ground, where the best aspects of tree and mesh topologies are combined, so there are choices for clock topology that are driven by your product requirements.

SPICE circuit simulation using an extracted IC netlist with parasitics is required for timing analysis of mesh-based clock topologies. For accurate analysis of OCV effects a Monte Carlo simulation using SPICE is required, and that is a very time-consuming step, plus your SPICE simulator may not have the capacity for such a large, extracted netlist. If your chip design group is intimidated by using SPICE for analyzing clock timing in a mesh-based topology, then there’s some good news, because the EDA vendor Infinisim has an easy to use product called ClockEdge that is used for clock timing analysis without you having to be a SPICE expert. The analysis provided with ClockEdge will help your team implement a mesh-based clocking topology the quickest, and with the least amount of training.

Summary

SoC designers tackle many technical issues to reach their Power, Performance and Area (PPA) goals, and choosing a clock topology is one of these issues. Most modern chip teams will be attracted to the benefits of using a combined tree and mesh topology for their clock network, as it provides high clock rates, low skew, acceptable routing resources, and withstands OCV effects. The timing analysis of mesh-based clock networks is now simplified by using the ClockEdge tool from Infinisim, as it provides analysis for metrics like: rail to rail failures, duty cycle distortion, slew rate and transition distortion, power-supply induced jjitter.

Related Blogs

Can Correlation Between Simulation and Measurement be Achieved for Advanced Designs?

Can Correlation Between Simulation and Measurement be Achieved for Advanced Designs?
by Mike Gianfagna on 03-18-2024 at 6:00 am

Can Correlation Between Simulation and Measurement be Achieved for Advanced Designs?

“What you simulate is what you get.” This is the holy grail of many forms of system design. Achieving a high level of accuracy between predicted and actual performance can cut design time way down, resulting in better cost margins, time to market and overall success rates. Achieving a high degree of confidence in predicted performance is not an easy task. Depending on the type of design being done, there are many processes and methods that must be executed flawlessly to achieve the desired result. There was a panel devoted to this topic at the recent DesignCon in Santa Clara, CA. Experts looked at the problem from several different perspectives.  Read on to learn more – can correlation between simulation and measurement be achieved for advanced designs?

About the Panel

The DesignCon panel was entitled, Extreme Confidence Simulation for 400-800G Signal Integrity Design. The event was organized by Wild River Technology, a supplier of products and services for advanced signal integrity design. Samtec also participated in the panel. Samtec and Wild River represented the two companies on the panel that focus on products and services specifically targeted to support advanced signal integrity designs. The balance of the panel included companies that focus on design methodology/tools and advanced product development, so all points of view were represented. Here is a summary of who participated – all have impressive credentials.

I will summarize the comments from Samtec and Wild River Technology on correlation for advanced designs  next, since these two points of view are wholly focused on correlation accuracy vs. product design or design methodology.

Al Neves, Founder & Chief Technology Officer, Wild River Technology

Al has over 39 years of experience in the design and application development of semiconductor products, capital equipment design focused on jitter and signal integrity analysis. He has successfully been involved with numerous business developments and startup activity for the last 17 years. Al focuses on measure-based model development, ultra-high signal integrity serial link characterization test fixtures, high-speed test fixture design, and platforms for material identification and measurement-simulation to 110 GHz.

Scott McMorrow, Strategic Technologist, Samtec

Scott currently serves as a Strategic Technologist for Samtec, Inc. As a consultant for many years, Scott has helped many companies develop high performance products, while training signal integrity engineers. He is a frequent author and spokesperson for Samtec.

Gary Lytle, Product Management Director, Cadence

Gary leads product strategy, positioning, sales enablement and demand generation for Cadence electromagnetic simulation technologies. He has held may positions in the RF and Simulation industry, including Technical Director with ANSYS, Inc, Lead Antenna Design Engineer with Dielectric Communications, Combat Systems Engineer with General Dynamics and Engineering Manager with Amphenol.

Cathy Liu, Distinguished Engineer, Broadcom

Cathy Ye Liu currently heads up Broadcom SerDes architecture and modeling group. Since 2002, she has been working on high-speed transceiver solutions. Previously she has developed read channel and mobile digital TV receiver solutions.

 

Jim Weaver, Senior Design & Signal Integrity Engineer, Arista Networks

Jim is responsible for design and analysis of large switches for cloud computing and high bit rate serial links. Jim has over 40 years of experience in system design, including 20 years of signal integrity experience, and is heavily involved with IEEE802.3dj electrical specification work.

Todd Westerhoff, High-Speed Design Product Marketing at Siemens EDA

Todd Westerhoff moderated the panel. He has over 42 years of experience in electronic system modeling and simulation, including 25 years of signal integrity experience. Prior to joining Siemens EDA, he held senior technical and management positions at SiSoft, Cisco and Cadence. He also worked as an independent signal integrity consultant developing analysis methodologies for major systems and IC manufacturers.

The focus of the panel was defined this way:

What’s the point in running detailed simulations if the PCB test vehicle you fabricate and assemble performs differently than you had predicted? This panel will discuss issues associated with achieving tight and repeatable correlation between simulation and measurement for structures such as vias, connector launches, transmission lines, etc. and the channels that contain them.

This correlation allows us to perform what we call “Extreme Confidence Simulation”. A wide set of simulation topics will be addressed that are focused on the epic signal integrity challenges presented by 400-800G communication.

Key Takeaways – Samtec

Scott provided his views and experience on correlation for advanced designs, beginning with the observation that, in order to correlate measurements to simulation, it is necessary to understand the limits of the methods. We assume our simulations are correct given correct modeling inputs. Further, we assume our measurements are correct given the best measurement methods. But are they?

Scott pointed out that there is a statistical probability of error in both the simulations and the measurements that has nothing to do with correct modeling of materials. Therefore, we need to understand these to improve our measurement to model correlation.

Scott then dove into significant detail to discuss HFSS simulation maximum delta S criteria, HFSS simulation convergence criteria, high frequency phase accuracy, transmission uncertainty, Mcal insertion loss error, and Mcal delay error.

Scott concluded his talk with a summary of what’s needed to understand the limits of measurement. For simulation modeling, understanding the convergence controls to achieve the necessarily level of correlation is mandatory. He pointed out that for VNA measurements, for all but metrology grade measurements, phase (delay) error is significantly low enough to be accurate within several hundred femtoseconds, which is fortunate for material identification problems.

But below 10 GHz, he warned of incorrect phase creeping in, altering the starting point for material identification, and creating time domain causality issues. At low frequencies, he suggested using a separate method to validate the low frequency and DC characteristics of the material, where the accuracy is higher.

A final comment from Scott: Separate correlation to individual structures so that accuracy can be preserved in both simulation and measurement.

Key Takeaways – Wild River Technology

Al took a direct approach to the topic, pointing out that EDA tools are not standards. “There is nothing “golden” about them (sorry). Believing EDA tools are standards can corrupt the path to high-speed design confidence.” He went on to explain that the path to simulation-to-measurement confidence is a hard road that takes a lot of work and it’s uncompromising.

The hard work is EDA calibration/benchmarking and building systematic approaches using advanced test fixtures (material ID, verification of models, etc.) The bottom line is that all EDA tools have issues, and it is our job to identify and work around them.

Al then spent some time on the importance of calibration and metrics. He explained that better calibration is required for simulation-measurement. For example, sliding load cal performance is required for good sim-measure correspondence. He felt the industry was over reliant on easy-to-use ECAL and the industry has neglected good mechanical cals. Al coined the term EDA Metrics Matter. His concluding points were:

  • Mindset matters
  • You cannot ignore Maxwell
  • The world of >70GHz is not in good shape for signal integrity
  • Metrics will be very useful

Summary, and Next Steps

There were similar messages from Scott and Al at this panel. Understanding how to calibrate results and factor in all sources of errors, including an understanding the materials being used is important.

Samtec offers a vast library of information on calibration and measurement accuracy. You can explore Samtec’s technical library here. I’m a fan of the gEEk spEEk webinars. You can explore the extreme signal integrity products and services offered by Wild River Technology here. So, can correlation between simulation and measurement be achieved for advanced designs? With the right approach and the right partners, I believe it can.


Measuring Local EUV Resist Blur with Machine Learning

Measuring Local EUV Resist Blur with Machine Learning
by Fred Chen on 03-17-2024 at 10:00 am

Measuring Local EUV Resist Blur with Machine Learning

Resist blur remains a topic that is relatively unexplored in lithography. Blur has the effect of reducing the difference between the maximum and minimum doses in the local region containing the feature. Blur is particularly important for EUV lithography since EUV lithography is prone to stochastic fluctuations and also driven by secondary electron migration, which presents a significant source of blur [1].

While optical sources of blur, such as defocus, flare, and EUV dipole image fading [2], can be considered as independent of wafer location, non-optical sources, such as from electron migration or acid diffusion, can have a locally varying behavior. It is therefore important to have some way to characterize and/or monitor the local blur in a patterned EUV resist.

The most straightforward way is to have a resist pattern that covers the whole exposure field with adequate resolution-scale sampling. A practical choice for a 0.33 NA EUV system would be a 20 nm half-pitch hole or pillar array, which gives equal sampling in x and y directions. It is also practically at the resolution limit for contact/via patterning due to stochastic variations [3,4]. As shown for an example in Figure 1, a large enough blur, e.g., 20 nm, is enough for the contact to go missing. Such a large blur may result from local resist inhomogeneities as well as occasionally large electron range.

Figure 1. 20 nm half-pitch via pattern, at 20 mJ/cm2 absorbed dose (averaged over 40 nm x 40 nm cell), with different values of blur. Quadrupole illumination is used with a darkfield mask. Secondary electron quantum yield = 2. A Gaussian was fit to the half-pitch via.

One can envisage that machine learning methods [5] can be used to match via appearance to the most likely blur at a given location, allowing a blur map to be generated for the whole exposure field. It also should be reminded that the rare large local blur scenario is consistent with the rare occurrence of stochastic defects [6]. Thus, studying local blur is important for basic understanding of not just the resist but also of the origin of stochastic defects.

References

[1] P. Theofanis et al., Proc. SPIE 11323, 113230I (2020).

[2] J-H. Franke, T. A. Brunner, and E. Hendrickx, J. Micro/Nanopattern. Mater. Metrol. 21, 030501 (2022).

[3] W. Gao et al., Proc. SPIE 11323, 113231L (2020).

[4] F. Chen, “Via Shape Stochastic Variation in EUV Lithography,” https://www.youtube.com/watch?v=Cj1gfDV7-GE

[5] C. Bishop, Pattern Recognition and Machine Learning, https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/

[6] F. Chen, “EUV Stochastic Defects from Secondary Electron Blur Increasing with Dose,” https://www.youtube.com/watch?v=Q169SHHRvXE, “Modeling EUV Stochastic Defects with Secondary Electron Blur,” https://www.linkedin.com/pulse/modeling-euv-stochastic-defects-secondary-electron-blur-chen

This article first appeared in LinkedIn Pulse: Measuring Local EUV Resist Blur with Machine Learning

Also Read:

Pinning Down an EUV Resist’s Resolution vs. Throughput

Application-Specific Lithography: Avoiding

Non-EUV Exposures in EUV Lithography Systems Provide the Floor for Stochastic Defects in EUV Lithography

Stochastic Defects and Image Imbalance in 6-Track Cells


Podcast EP212: A View of the RISC-V Landscape with Synopsys’ Matt Gutierrez

Podcast EP212: A View of the RISC-V Landscape with Synopsys’ Matt Gutierrez
by Daniel Nenni on 03-15-2024 at 10:00 am

Dan is joined by Matt Gutierrez. Matt joined Synopsys in 2000 and is currently Sr. Director of Marketing for Processor & Security IP and Tools. His current responsibilities include the worldwide marketing of ARC Processors and Subsystems, Security IP, and tools for the development of application-specific instruction set processors. Prior to joining Synopsys, Matt held various technical and management positions with companies such as Cypress Semiconductor, Fujitsu Limited, and The Silicon Group. Matt has over 25 years of experience in the semiconductor, computer systems, and EDA industries.

Matt provides an overview of what’s happening in custom processors and the impact of the RISC-V ISA. Matt also discusses what Synopsys is doing to enable application-specific processor design, including the recent announcement of its ARC-V processor IP.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


CEO Interview: Patrick T. Bowen of Neurophos

CEO Interview: Patrick T. Bowen of Neurophos
by Daniel Nenni on 03-15-2024 at 6:00 am

Patrick T. Bowen

Patrick is an entrepreneur with a background in physics and metamaterials. Patrick sets the vision for the future of the Neurophos architecture and directs his team in research and development, particularly in metamaterials design. He has a Master’s degree in Micro-Nano systems from ETH Zurich and PhD in Electrical Engineering from Duke University, under Prof. David Smith. After graduation, Patrick cofounded Metacept with Prof. Smith; Metacept is the world’s foremost metamaterials commercialization center and consulting firm.

Tell us about Neurophos. What problems are you solving?
We say we exist to bring the computational power of the human brain to artificial intelligence. Back in 2009 it was discovered that GPUs are much better at recognizing cats on the internet than CPUs are, but GPUs are not the answer to the future of AI workloads. Just as GPUs were better than CPUs for neural networks, there could be architectures that are better than GPUs by orders of magnitude. Neurophos is what comes next for AI after GPUs.

AI large language models in general have been limited because we haven’t had enough compute power to fully realize their potential. People have focused primarily on the training side of it, just because you had to train something useful before you could even think about deploying it. Those efforts have highlighted the incredible power of large AI models, and with that proof people are starting to focus on how to deploy AI at scale. The power of those AI models means we have millions of users who will use them every day. How much energy does it cost per user? How much does the compute cost per inference? If it’s not cheap enough per inference, that can be a very limiting thing for businesses  that want to deploy AI.

Energy efficiency is also a big problem to solve.  If you have a server that burns say 6 kiloWatts, and you want to go 100 times faster but do nothing about the fundamental energy efficiency, then that 6 kiloWatt server suddenly becomes a 600 kiloWatt server. At some point you hit a wall; you’re simply burning too much power and you can’t suck the heat out of the chips fast enough. And of course there are climate-change issues layered on top of that. How much energy is being consumed by AI? How much additional energy are we wasting just trying to keep data centers cool? So, someone needs to first solve the energy efficiency problem, and then you can go fast enough for the demands of the applications.

People have proposed using optical compute for AI for nearly as long as AI has existed. There are a lot of ideas that we work on today that are also old ideas from the 80s. For example, the original equations for the famous “metamaterials invisibility cloak”, and other things like the negative index of refraction, can be traced back to Russian physicists in the 60s and 80s. Even though it was sort of thought of, it was really reinvented by David Smith and Sir John Pendry.

Similarly, systolic arrays, which are typically what people mean when they say “tensor processor”, are an old idea from the late 70s. Quantum computing is an old idea from the 80s that we resurrected today. Optical processing is also an old idea from the 80s, but at that time we didn’t have the technology to implement it. So with Neurophos, we went back to reinventing the optical transistor, creating from the ground up the underlying hardware that’s necessary to implement the fancy optical computing ideas from long ago.

What will make customers switch from using a GPU from Nvidia, to using your technology?
So, the number one thing that I think most customers care about really is that dollars per inference metric, because that’s the thing that really makes or breaks their business model. We are addressing that metric with a solution that truly can increase the speed of compute by 100x relative to a state of the art GPU, all within the same power envelope.

The environmental concern is also something that people care about, and we are providing a very real solution to significantly mitigate energy consumption directly at one of its most significant sources: datacenters.

If you sit back and think about how this scales… someone has to deliver a solution here, whether it’s us or someone else. Bandwidth in chip packaging is roughly proportional to the square root of the area and power consumption in chip packaging is generally proportional to the area. This has led to all sorts of contorted ways in which we’re trying to create and package systems.

Packaging is one of the things that’s really been revolutionary for AI in general. Initially it was about cost and being able to mix chiplets from different technology nodes, and most of all, about memory access speed and bandwidth because you could integrate with DRAM chips. But now you’re just putting more and more chips in there!

Using the analog compute approach restores power consumption for compute down to the square root of area instead of proportional to area. So now the way in which your compute and power consumption scales goes the same way; you are bringing them into balance.

We believe we’ve developed the only approach to date for analog in-memory compute that can actually scale to high enough compute densities to bring these scaling laws into play.

How can customers engage with Neurophos today? 
We are creating a development partner program and providing a software model of our hardware that allows people to directly load PyTorch code and compile that. That provides throughput and latency metrics and how many instances per second etc. to the customer. It also provides data back to us on any bottlenecks for throughput in the system, so we can make sure we’re architecting the overall system in a way that really matters for the workloads of customers.

What new features/technology are you working on?
Academics have for a long-time sort of dreamt about what they might do if they had a metasurface like we’re building at Neurophos, and there are lot of theoretical papers out there… but no one’s ever actually built one. We’re the first ones to do it. In my mind most of the interesting applications are really for dynamic surfaces, not for static, and there is other work going on at Metacept, Duke, and at sister companies like Lumotive that I, and I think the world, will be pretty excited about.

Why have you joined the SC Incubator and what are the Neurophos’ goals in working with their organization over the next 24 months?

Silicon Catalyst has become a prestigious accelerator for semiconductor startups, with a high bar for admission.  We are excited to have them as a partner.  Hardware startups have a big disadvantage relative to software startups because of their higher demo/prototype cost and engineering cycle time, and this is even more true in semiconductor startups where the EDA tools and mask costs and the sheer scale of the engineering teams can be prohibitively expensive for a seed stage company.  Silicon Catalyst has formed a pretty incredible ecosystem of partners that provide significant help in reducing their development cost and accelerating their time to market.

Also Read:

A Candid Chat with Sean Redmond About ChipStart in the UK

CEO Interview: Jay Dawani of Lemurian Labs

Seven Silicon Catalyst Companies to Exhibit at CES, the Most Powerful Tech Event in the World


Checking and Fixing Antenna Effects in IC Layouts

Checking and Fixing Antenna Effects in IC Layouts
by Daniel Payne on 03-14-2024 at 10:00 am

Planar CMOS cross-section – antenna DRC

IC layouts go through extensive design rule checking to ensure correctness, before being accepted for fabrication at a foundry or IDM. There’s something called the antenna effect that happens during chip manufacturing where plasma-induced damage (PID) can lower the reliability of MOSFET devices. Layout designers run Design Rule Checks (DRC) to find areas that violate PID and then make edits to pass all checks.

A traditional antenna design rule will measure the metal (or via) layer to MOSFET gate layer, and if the area ratio is too large then the layout must be fixed by adding a protection diode.

Planar CMOS cross-section – antenna DRC

One IC layout scenario that a traditional DRC for antenna effects cannot handle is for AMS designs that have multiple power domains, using multiple isolated P-type wells as shown below. A new approach called path-based verification is required for the following four scenarios.

Risk connection has PID issue

Imbalanced area ratios between metal layers and well layers from two isolated wells

Complex connectivity connections

Unintentional protection diodes

These four layout scenarios can only be detected by an EDA tool that knows about devices, connectivity and electrical paths during the area calculations for metal and MOSFET gate layers. This is where the Calibre PERC tool from Siemens EDA comes in, as it can perform the complex path-based checks to identify PID areas, find electrostatic discharge (ESD) issues, and locate other paths that your design group is looking for. Here’s the PID flow for using Caliber PERC:

PID flow using Calibre PERC

Using this flow on an IC layout and looking at the results in Calibre RVE results viewer showed that a PID violation was found, because a risk connection was established in metal1 level, but the protection connection didn’t happen until the metal2 level.

PID violation at metal2 layer

The next PID violation was identified from imbalanced area ratios of metal layer and the N-buried layer (nbl). The area highlighted in purple (rve) is the victim device.

Imbalanced area PID issue

To get complete PID coverage your design team will have to use both the traditional DRC-based antenna checks plus the path-based checks. Run DRC-type checks early in the design stages as a preventative step. As more metal connections in a layout are completed, then paths form across isolated P-type wells are made, it’s time to add path-based verification, providing complete coverage.

In this early IC layout it’s time to run traditional DRC-based antenna checks to confirm the layout passes PID validation.

Prevent PID issues before all metal connections completed

As more metal paths are added to the IC layout, then it’s time to use the path-based tool, because it properly understands both the risk connection and protection connection.

Run Calibre PERC path-based checks

Summary

IC layouts must meet rigorous design rules to pass reliability and yield requirements set by the foundry or fab process being used. Traditional DRC-based antenna design rules can still be used for early-stage layout, but as more metal layers are added to complete the interconnects, then a path-based checking with Calibre PERC becomes necessary.

As the paths across isolated P-wells are established, the path-based flow of Calibre PERC can be used to check the IC layouts at IP, block/module and even full-chip levels for signoff. So it’s recommended to use both flows together to meet the reliability and yield goals.

Read the Technical Paper at Siemens online.

Related Blogs


Arteris is Unleashing Innovation by Breaking Down the Memory Wall

Arteris is Unleashing Innovation by Breaking Down the Memory Wall
by Mike Gianfagna on 03-14-2024 at 6:00 am

Arteris is Unleashing Innovation by Breaking Down the Memory Wall
(courtesy of Arteris)

There is a lot of discussion about removing barriers to innovation these days. Semiconductor systems are at the heart of unlocking many forms of technical innovation, if only we could address issues such as the slowing of Moore’s Law, reduction of power consumption, enhancement of security and reliability and so on. But there is another rather substantial barrier that is the topic of this post. It is the dramatic difference between processor and memory performance. While systems of CPUs and GPUs are delivering incredible levels of performance, the memories that manage critical data for these systems are lagging substantially. This is the memory wall problem, and I would like to examine how Arteris is unleashing innovation by breaking down the memory wall.

What is the Memory Wall?

The graphic at the top of this post illustrates the memory wall problem. You can see the steady increase in performance of single-threaded CPUs depicted by the blue line. The green line shows the exponential increase in performance being added by clusters of GPUs. The performance increase of GPUs vs. CPUs is estimated to be 100X in 10 years – a mind-boggling statistic. As a side note, you can see that the transistor counts for both CPUs and GPUs cluster around a similar straight line. GPU performance is delivered by doing less tasks much faster as opposed to throwing more transistors at the problem.

Many systems today are a combination of a number of CPUs doing broad management tasks with large numbers of GPUs doing specific tasks, often related to AI. The combination delivers the amazing throughput we see in many products. There is a dark side to this harmonious architecture that is depicted at the bottom of the chart. Here, we see the performance data for the various memory technologies that deliver all the information for these systems to process. As you can see, delivered performance is substantially lower than the CPUs and GPUs that rely on these memory systems.

This is the memory wall problem. Let’s explore the unique way Arteris is solving this problem.

The Arteris Approach – A Highly Configurable Cache Coherent NoC

 A well-accepted approach to dealing with slower memory access speed is to pre-fetch the required data and store it in a local cache. Accessing data this way is far faster – a few CPU cycles vs. over 100 CPU cycles. It’s a great approach, but it can be daunting to implement all the software and hardware required to access memory from the cache and ensure the right data is in the right place at the right time, and consistent across all caches. Systems that effectively deliver this solution are called cache coherent, and achieving this goal is not easy. A software-only coherency implementation, for example, can consume as much as ~25% of all CPU cycles in the system, and is very hard to debug. SoC designers often choose cache coherent NoC hardware solutions instead, which are transparent to the software running on the system.

Andy Nightingale

Recently, I had an opportunity to speak with Andy Nightingale, vice president product management & marketing at Arteris. Andy did a great job explaining the challenges of implementing cache coherent systems and the unique solution Arteris has developed to cope with these challenges.

It turns out development of a reliable and power efficient cache coherent architecture touches many hardware and software aspects of system design. Getting it all to work reliably, efficiently and hit the required PPA goals can be quite difficult. Andy estimated that all this work could require 50 engineering years per project. That’s a lot of time and cost.

The good news is that Arteris has substantial skills in this area and the company has created a complete cache coherent architecture into one of its network-on-chip (NoC) products. Andy described Ncore, a complete cache coherent NoC offered by Arteris. Management of memory access fits well in the overall network-on-chip architecture that Arteris is known for. Ncore manages the cache coherent part of the SoC transparently to software – freeing the system designer to focus on the higher-level challenges associated with getting the CPU and all those GPUs to perform the task at hand.

Andy ran down a list of Ncore capabilities that was substantial:

  • Productive: Connect multiple processing elements, including Arm and RISC-V, for maximum engineering productivity and time-to- market acceleration, saving 50+ person-years per project.
  • Configurable: Scalable from heterogenous to mesh topologies, supporting CHI-E, CHI-B, and ACE coherent, as well as ACE-Lite IO coherent interfaces. Ncore also enables AXI non-coherent agents to act as IO coherent agents.
  • Ecosystem Integration: Pre-validated with the latest Arm v9 automotive cores, delivering on a previously announced partnership with Arm.
  • Safe: Supporting ASIL B to ASIL D requirements for automotive safety applications, and being ISO26262 certified.
  • Efficient: Smaller die area, lower power, and higher performance by design, compared with other commercial alternatives.
  • Markets: Suitable for Automotive, Industrial, Enterprise Computing, Consumer and IoT SoC solutions.

Andy detailed some of the benefits achieved on a consumer SoC design. These included streamlined chip floorplanning thanks to the highly distributed architecture, promoting efficient resource utilization. The Arteris high-performance interconnect with a high-bandwidth, low-latency fabric ensured seamless data transfer and boosted overall system performance.

Digging a bit deeper, Ncore also provides real-time visibility into the interconnect fabric with transaction-level tracing, performance monitoring, and error detection and correction. All these features facilitate easy debugging and superior product quality. The comprehensive ecosystem support and compatibility with industry-standard interfaces like AMBA, also facilitate easier integration with third-party components and EDA tools.

This was a very useful discussion. It appears that Arteris has dramatically reduced the overhead for implementation of cache coherent architectures.

To Learn More

I mentioned some specifics about the work Arteris is doing with Arm. Don’t think that’s the only partner the company is working with. Arteris has been called the Switzerland of system IP. The company also has significant work with the RISC-V community as detailed in the SemiWiki post here.

Arteris recently announced expansion of its Ncore product. You can read how Arteris expands Ncore cache coherent interconnect IP to accelerate leading-edge electronics designs here.  In the release, Leonid Smolyansky, Ph.D. SVP SoC Architecture, Security & Safety at Mobileye offered these comments:

“We have worked with Arteris network-on-chip technology since 2010, using it in our advanced autonomous driving and driver-assistance technologies. We are excited that Arteris has brought its significant engineering prowess to help solve the problems of fault tolerance and reliable SoC design.”

There is also a short (a little over one-minute) video that explains the challenges that Ncore addresses. I found the video quite informative. 

If you need improved performance for your next design, you should definitely take a close look at the cache coherent solutions offered by Arteris. You can learn more about Ncore here. And that’s how Arteris is unleashing innovation by breaking down the memory wall.


2024 Outlook with Elad Alon of Blue Cheetah Analog Design

2024 Outlook with Elad Alon of Blue Cheetah Analog Design
by Daniel Nenni on 03-13-2024 at 10:00 am

elad alon sq

We have been working with Blue Cheetah Analog Design for three years now with great success. With new process nodes coming faster than ever before and with chiplets being pushed to the forefront of technology, the die-to-die interconnect traffic on SemiWiki has never been greater and chiplets is one of our top search terms.

Tell us a little bit about yourself and your company. 
I am the CEO and co-founder of Blue Cheetah Analog Design. I am also an Adjunct Professor of Electrical Engineering and Computer Sciences at UC Berkeley, where I was previously a Professor and co-director of the Berkeley Wireless Research Center (BWRC). I’ve held founding, consulting, or visiting positions at Locix, Lion Semiconductor (acquired by Cirrus Logic), Wilocity (acquired by Qualcomm), Cadence, Xilinx, Sun Labs, Intel, AMD, Rambus, Hewlett Packard, and IBM Research, where I worked on digital, analog, and mixed-signal integrated circuits for computing, high-speed communications, and test and measurement. According to Lance Leventhal at the Chiplet Summit, I have 280 published articles and 75+ patents. I have to admit I’m not sure about those numbers, but I do have a lot of experience with integrated circuit design – and particularly in analog / mixed-signal circuits – which is proving invaluable in the era of chiplets.  This is the last I’ll say about myself directly – in the rest of this interview, I’ll be telling the story of Blue Cheetah and our vision for chiplets and the overall semiconductor market.

What was the most exciting high point of 2023 for your company?
We announced silicon success on our die-to-die interconnect IP and picked up many exciting design wins. We’ve publicly disclosed DreamBig, Ventana, and FLC as our customers, and most recently, we announced our design win with Tenstorrent. We will announce more design wins soon. To our knowledge, most of the emerging chiplet product companies are using Blue Cheetah die-to-die interconnect, as are a number of large corporations.

What was the biggest challenge your company faced in 2023?
Thanks to the amazing support (not only financially) from our investors – particularly from our founding investors Sehat Sutardja and Weili Dai, as well as NEA (which led our Series B round in 2022) – along with our unique product offering (customized die-to-die interconnect IP), I’m happy to say that funding and filling the sales funnel have not been our biggest challenges. Keeping up with demand, on the other hand, is definitely keeping us on our toes; I always like to tell the members of my team that this is a very good challenge to have the opportunity to address.  The tremendous momentum building around chiplets drives the demand for Blue Cheetah’s solutions, so in some senses, the challenge is in scaling up along with that ongoing revolution.

How is your company’s work addressing this biggest challenge?
In the bigger picture, hardware and silicon designers look to chiplets as a key enabler for ever more capable and cost-efficient systems. Chiplets are well established amongst large players that control all components/aspects of a design (i.e., single vendor), and the allure of a “plug and play” chiplet market has garnered significant attention and investment from the industry.  Although a number of technical and business hurdles need to be overcome before that vision fully comes to fruition, the large majority of the benefits of that vision can be realized immediately.  Specifically, small groups of companies with aligned product strategies and (typically) complementary expertise are forming multi-vendor ecosystems.  Within these ecosystems, the companies can coordinate on the functionality, requirements, and interfaces of each chiplet (and, of course, the die-to-die interconnects that glue them together) to meet the needs of a specific product and/or product family. Blue Cheetah’s solutions support all three of these use cases (single-vendor, multi-vendor ecosystem, and plug-and-play), and many of our customers/partners are pioneers of the multi-vendor ecosystem approach.

What do you think the biggest growth area for 2024 will be, and why?
Indeed, the semiconductor market is in the middle of a major resurgence in recognition, investment, and (averaged over the last ~3 years) growth.  AI has played an enormous role in this resurgence. Still, the basis is broader than that – consider, for example, that today, 7 out of the top 10 companies, as ranked by market capitalization design, incorporate and/or sell their own semiconductors. (If you look at the top 10 tech companies by market cap, it goes to 9 out of 10, with the 10th being a semi manufacturing equipment supplier.)  The capabilities/cost structure of a company’s chips directly drives the user experience/value of the company’s products/services, and the companies delivering those products are in the best position to know what silicon capabilities/cost structure have the highest impact.  This hopefully makes it clear why specialization and customization are major themes; they have been for ~5+ years already and will continue to be in 2024 (and beyond).

How is your company’s work addressing this growth?
Chiplets are, in principle, the ideal vehicle to achieve the goals of specialization and customization with favorable manufacturing and design cost structures.  Ideally, a company can focus on its differentiating technologies while incorporating leading solutions to the remaining components of the product via (possibly other vendors’) chiplets and IP. At the same time, each chiplet can be targeted to the specific manufacturing technology / die size with the best cost/yield characteristics for that function.  Of course, all of these chiplets need to communicate with each other, and that is where Blue Cheetah is focused.  Blue Cheetah is unique in offering die-to-die interconnect solutions with the extensive customizability and configurability needed to meet the needs of the full range of chiplet products.  We also support the most comprehensive set of process technologies – we have already implemented our IP in 7 different nodes, including 5nm and below.

What conferences did you attend in 2023, and how was the traffic?
2023 was an action-packed year for us in terms of conferences – I believe someone from the Blue Cheetah team was at a conference once every month or at most two – and in-person attendance is definitely up (approaching or exceeding pre-COVID levels).  For example, we were at the Chiplet Summit, ISSCC, DAC, OCP Global Summit, and several foundry events. Our silicon demo generated a lot of interest, and we were very happy with the engagement from the people and partners who visited our booth.

Will you attend conferences in 2024? Same or more?
2024 is looking to be even more action-packed – both in terms of conferences (we’ve already been at CES and the Chiplet Summit) and more broadly.  With the global drive to establish and rejuvenate local semiconductor capabilities, we plan to expand to additional international venues this year to further foster relationships across a broad industry base.

Also Read:

Chiplet ecosystems enable multi-vendor designs

Die-to-Die Interconnects using Bunch of Wires (BoW)

Analog Design Acceleration for Chiplet Interface IP

Blue Cheetah Technology Catalyzes Chiplet Ecosystem