Banner 800x100 0810

The Fallacy of Operator Fallback and the Future of Machine Learning Accelerators

The Fallacy of Operator Fallback and the Future of Machine Learning Accelerators
by Kalar Rajendiran on 05-30-2024 at 6:00 am

Chimera GPNPU Block Diagram

As artificial intelligence (AI) and machine learning (ML) models continue to evolve at a breathtaking pace, the demands on hardware for inference and real-time processing grow increasingly complex. Traditional hardware architectures for acceleration are proving inadequate to keep up with these rapid advancements in ML models. Steve Roddy, Chief Marketing Officer at Quadric Inc., made a presentation on this topic, at the IPSoC Conference in Silicon Valley last month. His talk elaborated on why and where the traditional architectures fall short and how Quadric’s innovative Chimera General Purpose NPU (GPNPU) offers a superior, future-proof solution.

The Limitations of Traditional Architectures

Traditional heterogeneous architectures typically employ a combination of NPUs, DSPs, and CPUs to handle various aspects of ML inference and real-time processing. Each component brings its strengths to the solution. NPUs are optimized for matrix operations, DSPs for math kernel performance, and CPUs for general-purpose tasks. However, these strengths come with significant limitations. Managing the interplay between NPU, DSP, and CPU requires complex data transfers and synchronization, leading to increased system complexity and power consumption. Developers must contend with different programming environments and extensive porting efforts, making debugging across multiple cores even more challenging and reducing productivity. Moreover, fixed-function accelerators, like traditional NPUs, are designed to handle a limited set of operations.

The Evolution of AI/ML Models

In the early days of machine learning, hardware accelerators were designed to handle relatively simple and highly regular operations. State-of-the-art (SOTA) networks primarily consisted of matrix-style operations, which suited hardwired, non-programmable accelerators like NPUs. These NPUs provided efficient ML inference by focusing on matrix multiplications, pooling, and activation functions. However, this specialization limited their flexibility and adaptability as AI models evolved.

The introduction of transformer models, such as Vision Transformers (ViTs) and Large Language Models (LLMs), marked a significant shift in AI/ML complexity. Modern algorithms now incorporate a wide variety of operator types, far beyond the scope of traditional matrix operations. Today’s SOTA models, like transformers, utilize a diverse set of graph operators—ResNets may use around 8, while transformers can use up to 24. This diversity in operations challenges hardwired NPUs, which are not designed to handle such a broad range of tasks efficiently, highlighting the limitations of traditional NPU architectures. As ML models evolve, these accelerators quickly become obsolete, unable to support new operators and network topologies.

What About Operator Fallback?

To mitigate the limitations of fixed-function NPUs, traditional systems use a mechanism called “Operator Fallback.” This approach offloads the most common ML computation operators to the NPU, while the CPU or DSP handles the less common or more complex operations. The assumption is that fallback operations are rare and non-performance critical. However, this is a flawed assumption for several reasons. When fallback occurs, the CPU or DSP handles operations at significantly lower speeds compared to the NPU. This results in performance bottlenecks, where the slow execution of fallback operators dominates the overall inference time. Fallback requires seamless data transfer and control between the NPU and the programmable cores, adding to system complexity and power consumption. As ML models grow in complexity, the frequency and criticality of fallback operations increase, further degrading performance.

Quadric’s Chimera GPNPU

Quadric addresses these challenges with its Chimera GPNPU, an architecture designed to be as flexible and programmable as CPUs or DSPs, but with the performance of specialized accelerators.

Chimera consolidates the processing capabilities into a unified, single core architecture that runs all kernels as a C++ application. This simplifies SoC design by reducing the need for multiple specialized cores, easing integration and debugging. The GPNPU is purpose-built for matrix math and convolutions, maintaining high utilization similar to systolic arrays. This ensures excellent inference performance on ML tasks without relying on fallback. With the Chimera Graph Compiler, developers can auto-compile hundreds of networks and write/debug graph code and C++ code on one core, streamlining the development process and enhancing productivity. Chimera’s C++ programmability allows engineers to quickly add new ML operators, ensuring that the hardware can adapt to future ML models. This eliminates the risk of obsolescence associated with fixed-function accelerators.

By reducing the need for complex data transfers and synchronization between multiple cores, Chimera operates more efficiently, consuming less power. Available in 1 TOPS, 4 TOPS, and 16 TOPS variants, Chimera can scale to meet the demands of various applications, from low-power devices to high-performance systems.

View Chimera performance benchmarks on various neural networks here.

Summary

As ML models continue to evolve, the need for flexible, high-performance hardware becomes increasingly critical. Traditional architectures, relying on a combination of NPUs, DSPs, and CPUs, fall short due to their complexity, inefficiency, and risk of obsolescence. The fallback operator mechanism further exacerbates these issues, leading to significant performance bottlenecks.

Quadric’s Chimera GPNPU offers a compelling alternative, providing a unified, programmable architecture that eliminates the need for fallback. By addressing the inherent flaws of Operator Fallback, Quadric is setting a new standard for performance, flexibility, and future-readiness in ML computing. By simplifying SoC design, enhancing programming productivity, and ensuring future-proof flexibility, Chimera delivers a significant acceleration in ML inferencing and real-time processing.

Learn more at Quadric.

Also Read:

2024 Outlook with Steve Roddy of Quadric

Fast Path to Baby Llama BringUp at the Edge

Vision Transformers Challenge Accelerator Architectures


Secure-IC Presents AI-Powered Cybersecurity

Secure-IC Presents AI-Powered Cybersecurity
by Mike Gianfagna on 05-29-2024 at 10:00 am

Secure IC Presents AI Powered Cybersecurity

Design & Reuse held its IP-SoC Silicon Valley 24 event on April 25th, 2024, at the Hyatt Regency Santa Clara. The agenda was packed with many relevant and compelling presentations from companies large and small. I attended one presentation on security that stood out for me. Secure-IC presented “AI-powered cybersecurity: Securyzr™ Intrusion Detection System (IDS)”. The presentation discussed a comprehensive approach to system security that includes both hardware and software. The addition of AI makes it even more potent. Security is a growing issue in our industry, and we need more focus on the problem. I’ll review one example of this needed focus as Secure-IC presents AI-powered cybersecurity.

Presentation Overview

Yathiendra Vunnam

Yathiendra Vunnam gave the presentation. He is a Field Application Engineer at Secure-IC.  This allows him to participate in Secure-IC’s development in the United States in various verticals such as automotive, semiconductor, defense, space, and IoT.  He holds an MS degree in Cybersecurity from Georgia Institute of Technology, so security is something he has a passion for.

His presentation began with a description of the problem and the mission of Secure-IC:

IoT devices being interconnected, each and every object could be a threat for the whole network. Therefore, the security of the objects or the devices with their lifecycle management is key, and so is their data. To ensure the integrity of this data, the whole system must be secured and managed. Trusted devices enable trusted data.

Secure-IC partners with its clients to provide them with the best end-to-end cybersecurity solutions for embedded systems and connected objects, from Chip to Cloud

The “punch line” of this statement in the presentation is the graphic at the top of this post. Sercure-IC is a unique company that provides a wide array of security solutions. You can get a feeling for the breadth of the company’s impact here.

Intrusion Detection

Next, Yathiendra discussed Secure-IC’s Securyzr™ intrusion detection system (IDS). The goal of this technology is to maintain trust throughout the whole device lifecycle. Features of this technology include:

  • Threat Detection: Monitors CPU and memory activity, network traffic (CAN bus, Ethernet) and more
  • Threat Analysis: Rule-based or AI methodology to discriminate alerts and eliminate false positives
  • Threat Response: Immediate local threat response based on pre-defined rules and leveraging edge AI computing
  • Life Cycle Management: The acquired data from IDS are sent to the Securyzr Server in the cloud, facilitating device life cycle management for the entire fleet of devices
  • Securyzr iSSP Integration: IDS can be employed in the monitoring service of iSSP, providing a comprehensive solution for fleet-wide security management.

Next, Yathiendra discussed the Intrusion Detection System (IDS), which is a pure software solution that offers:

  • Answer to new cybersecurity regulations
    • Real-time monitoring of cybersecurity threats on a fleet of edge devices with alarms
    • Cloud-based dashboards
    • Cloud-based fleet data aggregation and processing for global updates based on rich data sets edge processing for fast mitigation
  • Minimal cost of implementation and integration in existing hardware solutions
  • Integration with Securyzr Server services
  • Leveraging of Secure-IC Securyzr integrated Secure Elements features

Next, AI-powered cybersecurity was discussed. This technology was summarized as follows:

Data collection

  • Collecting information coming from host CPU (buses, sensors, and security components such as the integrated secure element Securyzr iSE which may be included in the system SoC)

Threat Detection

  • Real-time anomaly detection on advanced techniques using rule-based & AI/ML methodology with the reduced rates of FN/FP (false negative / false positive)
    • Novelty detection (with nominal behavior)
  • In case Secure-IC Digital Sensors are implemented in the SoC, AI ML-based SW smart monitor is used to analyze the answers from these sensors and reach a conclusion on the detected threat

Threat Response

  • Immediate threat response based on pre-defined alert detection
  • Based on full software AI with model stored in the host for fast simple response
  • More complex situations may be managed by software AI module at the server side

Automotive Use Case

Yathiendra then described an automotive use case for the technology. He went into a lot of detail regarding where and how the technology is deployed. The diagram below summarizes how IDS is part of an end-to-end detection and response solution.

End to end detection and response solution

Final Thoughts and to Learn More

Yathiendra conclude his talk with the following points:

  • Intrusion Detection System is included in an edge device and can monitor network buses, sensors,…
  • The system verifies in real time whether each of your connected devices is attacked and sends information to your supervising infrastructure
  • The system is hardware agnostic – it runs on a CPU with OS
  • The system can easily interface with security component Securyzr ISE and take advantage of its features (sensors)

This was a very relevant and useful presentation on a holistic approach to security. You can learn more about the breadth of technology solutions offered by Secure-IC here. And that’s how Secure-IC presents AI-powered cybersecurity.

WEBINAR: Redefining Security – The challenges of implementing Post-Quantum Cryptography (PQC)

Also Read:

How Secure-IC is Making the Cyber World a Safer Place

2024 Outlook with Hassan Triqui CEO of Secure-IC

Rugged Security Solutions For Evolving Cybersecurity Threats


Mastering Copper TSV Fill Part 2 of 3

Mastering Copper TSV Fill Part 2 of 3
by John Ghekiere on 05-29-2024 at 8:00 am

Mastering Copper TSV Fill Part 2 of 3

Establishing void-free fill of high aspect ratio TSVs, capped by a thin and uniform bulk layer optimized for removal by CMP, means fully optimizing each of a series of critical phases. As we will see in this 3-part series, the conditions governing outcomes for each phase vary greatly, and the complexity of interacting factors means that starting from scratch poses an empirical pursuit that is expensive and of long duration.

Robust and void-free filling of TSVs with copper progresses through six phases as laid out below:

  1. Feature wetting and wafer entry (previous article)
  2. Feature polarization
  3. Nucleation
  4. Fill propagation
  5. Accelerator ejection
  6. Bulk layer plating
  7. (Rinsing and drying, which we won’t cover in this series)

Feature Polarization

Before we talk about features specifically, let’s briefly review electrolyte formulation. In general, copper TSV plating chemistries are formulated of certain inorganic components and certain organic components. The inorganics are: deionized water, copper sulfate, sulfuric acid and hydrochloric acid. And the organics are commonly referred to as accelerator, suppressor and leveler. We could get very deep into the specifics here, and they are truly fascinating. However, we are not attempting to invent a TSV chemistry, but rather to put an existing one to use.

We ended the previous article having described wafer entry. In most cases, this entry step is followed directly by a brief “dwell step” during which the wafer simply sits in the electrolyte, ideally spinning at moderate speed, with no potential applied (thus no current flowing). During this step, the chemical affinity of the suppressor for the copper will cause the suppressor (and leveler as well) to adsorb to the surface. Complete coverage of the surface is critical as any location that is under-suppressed will experience unwanted copper growth. A previous colleague of mine used to refer to the effectiveness of this suppressor coverage as forming either a “blanket” of suppressor or an “afghan” of suppressor.

After the brief dwell step, the recipe moves into its first plating step. Here is where proper polarization plays out (or doesn’t!) Suppressors and accelerators operate in a sort of competition with each other in copper fill. Thus the specific formulations and relative concentrations matter very much in terms of the effectiveness of a given plating chemistry. Suppressors have an advantage over accelerators in that they absorb more readily. But accelerators have an advantage in that they diffuse more rapidly.

Given this, we can quickly understand how polarization plays out. All the components of the chemistry have equal access to the surface of the wafer as soon as it enters the bath. But suppressors dominate adsorption on the surface because they adsorb more readily, even in the presence of ample accelerator. However, all components need to travel down the via hole in order to adsorb in there. And this is where the advantage goes to the accelerator. Yes, it is a slower adsorber (spell check says that’s not a word), but because it gets to the bottom of the via before the other additives, it has the time it needs. A distribution thus forms wherein suppressor adsorption manifests at a very high concentration at the top of the via and accelerator adsorption manifests at a very high concentration at the bottom.

The blanket of suppressor behaves as a thin insulator on the surface of the copper; and its coverage thins and disappears down the wall of the TSV. This is the effect we are calling polarization.

So what happens if polarization does not work out? And how do you know whether it worked or didn’t? And what do you do about it?

What happens? Sadly, there is a hard truth in polarization. You either achieved it, or you didn’t. There really are no second chances. What I mean is that, in setting up the process, if the initial adsorption distribution is not favorable to bottom up fill, no subsequent recipe can recover deposition distribution for a good fill.

How do you know? We will cover this in more detail in the next article because it has a great deal to do with evaluating fill propagation. But suffice to say for now that you will be staring at FIB/SEM images.

What do you do? The two most likely causes for a failure in polarization are:

  1. Dwell time duration is wrong. A dwell step that is too long or too short can lead to a non-optimal adsorption profile. Too short may mean the accelerator did not have quite enough time to collect in the via bottom at high concentration. Too long may mean suppressor or leveler molecules had time to get down there too. The height of the via is going to be a factor here.
  2. Non-optimal mixture ratio of the organic components. Remember, suppressor and accelerator are in competition here. Too much of one and too little of the other and we don’t get the gradient we were aiming for. It’s important to note here that levelers are, in fact, a specific form of suppressor. And, in the case of the more advanced TSV fill chemistries on the market, the leveler is the more active than the conventional suppressor. So if you are using a more highly formulated TSV chemistry (you can tell by the price), adjusting leveler-to-accelerator ratios may be necessary.

Nucleation

As I hinted in the previous article, nucleation may or may not be necessary to good fill of TSVs. In my compelling cliff hanger on the topic, I also let drop that optimal nucleation can make up for some non-optimal outcomes from upstream processes.

What is nucleation? Nucleation has to do with the nature by which deposition of the copper initiates. And now we are talking about the molecular level, particularly way down at the via bottom. When the seed copper is rough, whether because of roughness in the seed copper itself or else roughness of the underlying layers translated through the seed, the texture will create hyper-local areas of increased current density. If you imagine zooming well in on the roughness, you will observe peaks and valleys in the surface. The peaks are the areas that generate the higher current density.

That higher current density more strongly attracts copper ions in the chemistry. The result is that copper ions can migrate to and preferentially deposit onto these peaks instead of distributing evenly. The peak thus increases in size, which makes that location even more favorable for deposition. And a bit of a runaway occurs. The observable result of this behavior is the formation of copper nodules. These nodules continue to grow faster than surrounding areas until the nodules begin to expand into each other. Guess what that causes. Yes. Voids. Large scale nodule formation will trap numerous tiny voids around the bottom side wall as the nodules grow into each other.

If such voids are observed, then better control of nucleation is likely necessary. The key here is that we not allow lazy copper ions to meander to whatever location they prefer, but rather to force them to lay down at the nearest location. We accomplish this by doing something that is very bad.

But it’s ok because we don’t do it for long. Agreed?

The bad thing we need to do is to put the system into a state approaching the “Limiting Current Density”. The limiting current density is the current density at which the ions available to react at the surface are consumed faster than they can be replenished. We do this by setting the current in the initial plating step to effect a much higher current density than normal. Perhaps 4 times higher. What happens is that we increase the deposition rate so much that copper ions do not have the chance to migrate but rather deposit as quickly as possible in the immediate vicinity, peak or no peak.

Again, this is a very bad thing we are doing and going on too long would cause a number of deleterious outcomes including…electrolysis of water.

I’ll bet you thought I was going to say voids.

Actually, the answer is also voids. It will cause voids.

So we would do this nucleation step for a short period of time, say 500 milliseconds. The result should be a “flattening” of the surface and the avoidance of nodule formation.

Maybe time to regroup:

  1. We wetted the features. Best way to do this is by using a vacuum prewet chamber, especially for TSVs with an aspect ratio of 5 or greater.
  2. We transferred the wafer to the plating reactor (ideally a fountain type that spins the wafer during plating) and performed an optimal entry which avoids trapping any air against the wafer surface.
  3. We allowed the wafer to dwell in the chemistry for a short period of time, allowing suppressor to coat the wafer surface and accelerator to race down into the vias ahead of everyone else. Then we initiated a potential on the system causing current to flow. A lovely gradient of adsorbed organic molecules formed on the via surfaces. Our vias were polarized.
  4. We had previously noted that roughness on the surface of the lower via was prone to causing nodules so we deployed the initial plating step as a high current nucleation step for half a second before returning to a normal current density.
  5. And now we are ready to look at propagation of fill.

Looking ahead to our final post on the topic, we have a lot of ground left to cover: Fill propagation, accelerator ejection and bulk layer plating. It all matters so now slowing down now. In fact, things are going to get quite fast now. Well, I mean in terms of plating steps that take an hour or so.

If you enjoyed this post, be sure to Like and to follow me so you don’t miss a single thing. Meanwhile, you suspect polarization didn’t work and you don’t want to wait til next week to get it under control. Get hold of us and let’s see how we can help your product or your fab retain technology sovereignty in a highly competitive marketplace.

Also Read:

Mastering Copper TSV Fill Part 1 of 3

 


Using LLMs for Fault Localization. Innovation in Verification

Using LLMs for Fault Localization. Innovation in Verification
by Bernard Murphy on 05-29-2024 at 6:00 am

Innovation New

We have talked about fault localization (root cause analysis) in several reviews. This early-release paper looks at applying LLM technology to the task. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and now Silvaco CTO) and I continue our series on research ideas. As always, feedback welcome.

The Innovation

This month’s pick is A Preliminary Evaluation of LLM-Based Fault Localization. This article was published in arXiv.org in August. The authors are from KAIST in South Korea.

It had to happen. LLMs are being applied everywhere so why not in fault localization? More seriously there is an intriguing spin in this paper, enabled by the LLM approach – explainability. Not only does this paper produce a root cause; it also explains why it chose that root cause. For me this might add a real jump to success rates for localization. Not because the top candidate will necessarily be more accurate but because a human verifier (or designer) can judge whether they think the explanation is worth following further. If an explanation comes with each of the top 3 or 5 candidates, perhaps augmented by spectrum-based localization scores, intuitively this might increase localization accuracies significantly.

Paul’s view

A very timely blog this month: using LLMs to root cause bugs. No question we’re going to see lot more innovations published here over the next few years!

In 2022 we reviewed DeepFL which used an RNN to rank methods based on suspiciousness features (complexity, mutation, spectrum, text). In 2023 we reviewed TRANSFER-FL which used another RNN to improve ranking by pre-classifying bugs into one of 15 types based on training across a much larger dataset of bugs from GitHub.

This paper implements the entire “fault localization” problem using prompt engineering on OpenAI’s GPT-3.5. Two cutting edge LLM-based techniques are leveraged: chain-of-thought prompting and function calling. The former is where the question to the LLM includes an example not only of how to answer the question but the suggested thought process the LLM should follow to obtain the answer. The latter is where the LLM is given the ability to ask for additional information automatically by calling on user-provided functions.

The authors’ LLM prompt includes the error message and a few relevant lines of source code referenced by the error message. The LLM is given functions that enable to query if the test covered a particular  method and to query the source code or comments for a method.

As is typical for fault localization papers, results are benchmarked on Defects4J, an open source database of Java code bugs. Somewhat amazingly, despite no pre-training on the code being debugged or prior history of passing and failing test results, the buggy method is ranked in the top-5 by the LLM in 55% of the cases benchmarked! This compares to 77% for DeepFL, but DeepFL required extensive pre-training using Defects4J data (i.e. leave-out-one cross validation). TRANSFER-FL is hard to compare since it is a more precise ranker (statement-level accurate not method-level). Most likely, a combination of LLM-based and non-LLM based methods will be the long term optimal approach here.

Raúl’s view

This paper is the first to use LLMs for fault localization (FL) and was published in August 2023. A search for “Use of LLM in Fault Localization” reveals another paper from CMU, published in April 2024, but it employs a different methodology.

The main idea in this paper is to overcome the LLM limit of 32,000 tokens (in this case ChatGPT), which is insufficient if the prompt includes, for example, 96,000 lines of code. Instead, to navigate the source code, the LLM can call functions, in particular (the names are self-explanatory) get_class_covered  get_method_covered, get_code_snippet and get_comments.

The actual technique used, called AutoFL, requires only a single failing test. It works by first prompting the LLM to provide a step-by-step explanation on how the bug occurred, with some prompt engineering required (Listing 1). The LLM goes through the code with the functions and gives an explanation. Using this, AutoFL then prompts ChatGPT to find the fault location (Listing 3) assuming the LLM has implicitly identified it in the previous phase. To improve the technique, the authors restrict the function calls to 9, and do the whole process 5 times using all 5 results to rank the possible locations.

The paper compares AutoFL with seven other methods (reference [41]) on a benchmark with 353 cases. AutoFL finds the right bug location more often than the next best (Spectrum Based FL) when using one suggestion: 149 vs. 125. But it does worse when using 3 or 5 suggestions: 180 vs. 195 and 194 vs. 218. The authors also note that 1) AutoFL needs to call the functions to explore the code, otherwise the result gets much worse; 2) more than 5 runs still improves the results; and 3) “One possible threat to validity is that the Defects4J bug data was part of the LLM training data by OpenAI”.

The approach is experimental with sufficient details to be replicated and enhanced. The method is simple to apply and use for debugging. The main idea of letting the LLM explore the code with some basic functions seems to work well.


Elevating Your SoC for Reconfigurable Computing – EFLX® eFPGA and InferX™ DSP and AI

Elevating Your SoC for Reconfigurable Computing – EFLX® eFPGA and InferX™ DSP and AI
by Kalar Rajendiran on 05-28-2024 at 10:00 am

Use Case eFPGA Complementing Signal Processing

Field-Programmable Gate Arrays (FPGAs) have long been celebrated for their unmatched flexibility and programmability compared to Application-Specific Integrated Circuits (ASICs). And the introduction of Embedded FPGAs (eFPGAs) took these advantages to new heights. eFPGAs offer on-the-fly reconfiguration capabilities, allowing system designers to adapt to evolving protocols and cryptographic standards without the need for costly hardware changes. This inherent flexibility not only reduces risk but also ensures longevity and scalability, essential factors in today’s fast-paced technological landscape.

Flex Logix is well known for its reconfigurable computing solutions, particularly embedded Field-Programmable Gate Arrays (eFPGAs). What may not be as well-known is the company’s signal processing IP. As Artificial Intelligence (AI) applications continue to proliferate across industries, the need for making asynchronous decisions has become increasingly imperative. These applications demand robust support for linear math operations, convolution, and transforms, all of which require a powerful signal processing engine. To address this need, FlexLogix offers signal processing IP as well. By combining eFPGAs with dedicated signal processing capabilities, designers can develop more efficient solutions that preempt the signal processing engine’s starvation due to memory bandwidth limitations. This integration not only enhances performance but also unlocks new possibilities for real-time processing and analysis in AI applications. This was the focus of a talk by Jayson Bethurem, VP of Marketing at FlexLogix, at the IPSoC 2024 Silicon Valley conference.

eFPGAs and Asynchronous Applications

eFPGAs present a versatile solution for asynchronous applications, operating without a global clock signal and relying on local timing mechanisms. One key advantage of eFPGAs in this domain is their ability to offer custom timing control, allowing designers to implement precise timing circuits and control mechanisms tailored to the specific requirements of asynchronous applications. This flexibility enables optimization of timing parameters such as delay, skew, and signal propagation independently for different parts of the circuit, ensuring efficient operation.

Moreover, eFPGAs facilitate the implementation of fine-grained synchronization techniques, such as handshake protocols and delay-insensitive circuits, commonly used in asynchronous design methodologies. These synchronization mechanisms ensure correct operation and data integrity in asynchronous systems, even in the presence of varying delays and timing uncertainties. Additionally, eFPGAs provide high-speed interconnect resources that can be customized to build efficient communication channels between asynchronous modules or data processing elements, enhancing the performance and scalability of asynchronous systems. With support for dynamic reconfiguration, power efficiency features, and fault tolerance mechanisms, eFPGAs serve as an attractive platform for developing efficient and reliable asynchronous systems across various domains.

Signal Processing in eFPGA Use Cases

eFPGAs enable customizable accelerators for algorithms in machine learning and signal processing. Additionally, eFPGAs handle protocol offloading in communication systems, reconfigurable I/O interfaces in consumer electronics, and real-time data processing in storage and communication devices. They also support firmware upgrades in embedded systems and dynamic resource allocation in high performance computing (HPC). In automotive and industrial automation, eFPGAs facilitate sensor fusion and real-time image processing. They enable customizable networking protocols, enhance security features, and ensure fault tolerance in critical systems.

Summary

eFPGA integration streamlines product development by minimizing mask spins, reducing engineering costs, and accelerating time-to-market. Their adaptability ensures longevity by accommodating evolving protocols and facilitating periodic bug fixes through firmware updates, thus averting costly recalls. They enable product differentiation by implementing unique features, attract customers, and support premium pricing. Moreover, they meet regional requirements, address security threats, and streamline testing and debugging processes, further enhancing efficiency. Lastly, eFPGA integration supports the integration of evolving IP cores (such as in the field of AI), ensuring products remain competitive with the latest technological advancements without requiring hardware upgrades.

For more details, visit https://flex-logix.com/

Also Read:

WEBINAR: Enabling Long Lasting Security for Semiconductors

LIVE WEBINAR: Accelerating Compute-Bound Algorithms with Andes Custom Extensions (ACE) and Flex Logix Embedded FPGA Array

Reconfigurable DSP and AI IP arrives in next-gen InferX


The 2024 Design Automation Conference and Certus Semiconductor

The 2024 Design Automation Conference and Certus Semiconductor
by Daniel Nenni on 05-28-2024 at 6:00 am

DAC Banner 2024

DAC is right around the corner and this could possibly be the last one in San Francisco for a while so do not miss it. The weather will be absolutely great and there are many things to do outside of the conference including sailing on the bay.

The Design Automation Conference (DAC) is a premier event that focuses on the design and automation of electronic systems. It is an annual conference that has been held since 1964, making it one of the longest-running and most established events in the field of electronic design automation (EDA).

DAC offers outstanding training, education, exhibits and superb networking opportunities for designers, researchers, tool developers and vendors. The conference is sponsored by the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE) and is supported by ACM’s Special Interest Group on Design Automation (SIGDA) and IEEE’s Council on Electronic Design Automation (CEDA).

We met with Certus Semiconductor last year at DAC and I must say that I am impressed with this customer centric company that is clearly in the right place at the right time.

Certus Semiconductor continues to drive innovation and methods to lower costs and improve performance in cutting edge I/O and ESD solutions. Certus Semiconductor will be attending DAC 2024, its fourth attendance, and will share the advantages of its high-performance I/O and ESD products.

The benefits of Certus’ IP have been experienced across the industry. Leading edge, off-the-shelf, and customizable I/O and ESD libraries for low-power, small footprint, wide voltage ranges, and RF low capacitance. Certus continues to offer options for flexible interfaces, multi-function, and higher performance I/O libraries that target FPGA and high-performance computing markets. Certus’ flagship offerings are robust and can offer user a myriad of options, and Certus is excited to highlight two new offerings in 22nm and 12nm technology nodes.

Certus’ ultra-low leakage library in 22nm technology slashes power usage with proprietary Certus technology. The library features a general-purpose I/O with nominal pad leakage current of 0.15nA and a pad leakage of 1nA at 85C. The total supply leakage is 16nA at 85C, or low picoamps under typical operating conditions. The library also features an open-drain I/O that draws a mere 35nA from the pad and 100nA from the supply. All auxiliary cells are similarly optimized for power saving. The power saving enables customers to lower overall power consumption, making a system less expensive to operate and increasing its reliable lifetime.

Certus Semiconductor also offers an extremely robust, radiation-hardened, silicon proven ESD library in GlobalFoundries 12nm FinFET, 12LP+ technology. This library offers ESD protection for 0.8V, 1.8V, 2.5V, and 3.3V domains, with standard I/O ESD from 2kV to >8kV HBM, and 3A to >12A CDM. The library includes low capacitance RF ESD protection, featuring 30fF solution to 100fF Solutions, with protection from 1kV HBM to >3kV HBM and 3A to >6A CDM. The radiation hardening has been tested to 64MeV proton test and survivability of >1.3E+09 flux. Power clamps within this library have low rush current for power supply ramps up to 1V/uS. This ESD library will enhance any product that is being considered for operation in an extreme environment.

Certus is one of the many companies supporting this industry-leading event, and they invite you to meet with the Certus I/O and ESD experts on the exhibit floor. You can contact Certus here to schedule a meeting at booth #1328. I hope to see you there!

Also Read:

2024 Outlook with Stephen Fairbanks of Certus Semiconductor

Unique IO & ESD Solutions @ DAC 2023!

The Opportunity Costs of using foundry I/O vs. high-performance custom I/O Libraries

CEO Interview: Stephen Fairbanks of Certus Semiconductor


AI System Connectivity for UCIe and Chiplet Interfaces Demand Escalating Bandwidth Needs

AI System Connectivity for UCIe and Chiplet Interfaces Demand Escalating Bandwidth Needs
by Kalar Rajendiran on 05-27-2024 at 10:00 am

Alphwave Semi UCIe PHY Support for All Package Types

Artificial Intelligence (AI) continues to revolutionize industries, from healthcare and finance to automotive and manufacturing. AI applications, such as machine learning, deep learning, and neural networks, rely on vast amounts of data for training, inference, and decision-making processes. As AI algorithms become more sophisticated and datasets grow larger, the demand for computational power and data throughput is escalating rapidly. With the proliferation of data-intensive tasks, AI systems require escalating bandwidth to support seamless communication between diverse components, including CPUs, GPUs, accelerators, memory modules, and specialized modules dedicated to AI tasks. To meet these demands, AI systems require robust connectivity solutions that can provide high bandwidth, low latency, scalability, and energy efficiency.

The Role of UCIe and Chiplet Interfaces

With disaggregation of resources for optimizing system architectures, semiconductor design and package optimizations are the future of advanced compute semiconductors. Chiplet interfaces offer a promising solution to the escalating bandwidth needs in AI systems by providing efficient connectivity between disparate components. For example, chiplet interfaces enable disaggregated architectures with cloud computing infrastructure, where CPU, GPU, and memory chiplets are interconnected via high-speed interfaces, allowing for efficient resource allocation and utilization in AI training and inference tasks. In autonomous vehicles, chiplet interfaces enable seamless integration of AI accelerators, sensor processing units, and communication modules, supporting real-time decision-making and sensor fusion tasks. In healthcare, chiplet interfaces facilitate the integration of AI accelerators with medical imaging devices, enabling faster image processing and analysis for diagnostic purposes.

UCIe, in particular, defines a standardized framework for chiplet-based interconnectivity, enabling seamless integration and communication between chiplets from different vendors.

Benefits of Standardized Interfaces for AI System Connectivity

High Bandwidth: UCIe and chiplet interfaces support high-speed data transfer rates, allowing for rapid exchange of information between chiplets. This high bandwidth is essential for handling large datasets and accelerating AI workloads.

Low Latency: With reduced signal propagation delays and optimized routing algorithms, UCIe and chiplet interfaces minimize latency, ensuring timely processing of data and real-time responsiveness in AI applications.

Scalability: AI systems often require flexible and scalable architectures to accommodate increasing computational demands. UCIe and chiplet interfaces enable modular designs, where chiplets can be added or removed dynamically, allowing for seamless scalability as workload requirements evolve.

Energy Efficiency: UCIe and chiplet interfaces are designed to optimize energy efficiency by minimizing power consumption during data transfer and communication. This is particularly important for AI systems deployed in edge computing and IoT devices with limited power budgets.

Addressing AI System Connectivity Needs

At the IPSoC 2024 conference last month, Sue Hung Fung , Principal Product Line Manager

And Soni Kapoor, Principal Product Marketing Manager, both from Alphawave Semi, presented the company’s offerings addressing these needs.

Alphawave Semi’s Complete UCIe Solution

Leveraging silicon-proven analog IP, the UCIe solution boasts a robust Physical Layer-Electrical PHY (Analog Front End) responsible for ensuring reliable and high-speed data transmission between chiplets. This includes critical functions such as clocking, link training, and sideband signal management, all integrated seamlessly to enable efficient communication across the UCIe interconnect. Additionally, the UCIe solution features a Die-to-Die Adapter component, facilitating link state management and parameter negotiations crucial for chiplet interoperability, while implementing error detection and correction mechanisms to ensure robust data transmission. With support for industry-standard protocols like PCIe and CXL, as well as a Streaming Protocol for enhanced system design flexibility, Alphawave Semi’s UCIe solution offers a comprehensive platform for interoperability testing, ensuring seamless integration into diverse computing systems.

Alphawave Semi’s UCIe Physical Layer (PHY) is designed to accommodate various package types, including standard x16 and x32 configurations commonly found in servers, workstations, and high-performance computing platforms, as well as advanced x32 and x64 packages ideal for data centers and AI accelerators. This support for multiple package types not only ensures seamless integration into existing and future computing systems but also provides system designers with the flexibility to tailor configurations to specific application needs. Leveraging advanced signaling and interface technologies, the UCIe PHY delivers high-speed data transmission and low-latency communication, ensuring optimal performance for demanding workloads.

Summary

As AI computational demands are escalating, chiplets play a crucial role in enabling efficient and scalable solutions. Alphawave Semi’s D2D IP Subsystem Solutions, tailored for chiplet communication, empower AI systems to achieve unprecedented levels of performance and energy efficiency. Alphawave Semi’s comprehensive solutions and chiplet architectures cater to the evolving demands of System-in-Packages (SiPs). In addition to its UCIe interface solutions, Alphawave Semi offers many other high-performance connectivity silicon IP. To learn more, visit the company’s product page.

Also Read:

Alphawave Semi Bridges from Theory to Reality in Chiplet-Based AI

The Data Crisis is Unfolding – Are We Ready?

Accelerate AI Performance with 9G+ HBM3 System Solutions


Webinar – CHERI: Fine-Grained Memory Protection to Prevent Cyber Attacks

Webinar – CHERI: Fine-Grained Memory Protection to Prevent Cyber Attacks
by Mike Gianfagna on 05-27-2024 at 6:00 am

Webinar – CHERI Fine Grained Memory Protection to Prevent Cyber Attacks

Cyber attacks are top of mind for just about everyone these days. As massive AI data sets become more prevalent (and more valuable), data security is no longer “nice to have”. Rather, it becomes critical for continued online operation and success. The AI discussion is a double-edged sword as well. While AI enables many new and life-changing capabilities, it has also enabled very sophisticated data breaches. Codasip is presenting a webinar soon that provides a powerful new capability to significantly reduce the data security risks faced by advanced systems. If you worry about these topics, this webinar is a must-see event. A registration link is coming, but first let’s look at what you’ll learn about a technology called CHERI and how it delivers fine-grained memory protection to prevent cyber attacks.

Watch the Replay

About the Webinar

Capability Hardware Enhanced RISC Instructions (CHERI) technology was developed at the University of Cambridge as the result of research aimed at revisiting fundamental design choices in hardware and software to improve system security. CHERI has been covered on SemiWiki previously. You can find several posts on the technology here. The headline news in these posts is that Codasip is the first to deliver a production implementation of CHERI for the RISC-V ISA.

Carl Shaw

The implications of this are significant. The Codasip webinar does a great job explaining the history, details, and capabilities of CHERI. You will learn what this technology can do and how to use it on your next project. There are two webinar presenters who cover a lot of ground in a relatively short amount of time. The entire webinar, including a very informative Q&A session is just over 30 minutes. Here is some background on the presenters:

Carl has over 30 years of experience developing software and securing embedded systems and processors. Carl now works as a Safety & Security Architect at Codasip, where he evaluates leading-edge security and safety technology and leads its adoption and implementation into Codasip’s products.

Andrew Lindsay

Andrew started his 20+ year career in security working on the IP and architectures for complex Pay-TV System-on-Chips. This paved the way to many years of consultancy for semiconductor and product manufacturers. He now also works as a Safety & Security Architect at Codasip, where he looks after the system aspects of security and helps with the ISO 26262 and ISO 21434 certification of products.

Let’s look at the topics these gentlemen cover in the upcoming webinar.

Webinar Details

Here are the main topics covered during the webinar. I’ll provide a taste of what you will learn.

Software Security Vulnerabilities

There is an incredible statistic about the root cause of cyber vulnerabilities. It turns out that for many years, about 70% of the attacks can be traced to exploitation of memory weaknesses. Carl and Andrew dive into this incredible statistic. You will learn a lot about the roots of memory weaknesses and how to address these issues at the architectural level. Some great history is also presented.

What is CHERI?

We already covered what the acronym stands for. CHERI is an extension of a processor ISA that enables robust memory access mechanisms with a software/hardware design paradigm. A core part of the technology is something called capability-based addressing. This approach has been around since the 1960s. What is new is the approach to add capabilities to contemporary ISAs.

A capability is a token or “key” that grants the bearer the authority to access a specific resource or perform a specific operation. The webinar dives into the details of how this security approach can have significant impact.

How Can CHERI be Used?

Several examples are explored in this section that illustrate the application of CHERI to address real-world security challenges. The discussion begins with an illustration of protection of data in a stack. A very interesting discussion on compartmentalization then follows.

Software Impact

Here is where the webinar presenters dig into how CHERI works for real-world problems. It turns out there are no major re-writes required to enhance security with CHERI. Re-compiling the application with a CHERI-enabled compiler will produce a large impact with small effort.

More details of approaches to implement CHERI are also presented, along with a discussion of impact on code size and memory usage. A lot of detail is presented.

Codasip’s CHERI Implementation

In the final segment of the webinar, the history and focus of Codasip is presented, along with significant details about the CHERI-enabled technology available from Codasip. The work the company is doing with many partners across the ecosystem is also explained. The graphic at the top of this post is a depiction of the breadth of this work.

This is followed by an excellent Q&A session that covers many probing and some provocative topics. All-in-all, a great use of a half hour!

To Learn More

You can register for the webinar replay here, don’t miss it! And that’s how CHERI delivers fine-grained memory protection to prevent cyber attacks.


Top three challenges for global semiconductor manufacturing in 2024

Top three challenges for global semiconductor manufacturing in 2024
by Stephen Rothrock on 05-26-2024 at 8:00 am

Top three challenges for global semiconductor manufacturing in 2024

Poised for recovery in 2024 and driving toward a historic $1 trillion in revenue, the global semiconductor industry has an incredibly promising future, backed by an unprecedented number of growth drivers, market opportunities, and technology advancements. Nevertheless, amid record greenfield capital investments and government-backed regional capacity expansion, global semiconductor manufacturing still needs to overcome perennial headwinds over the coming years.

Tracking the exchange of wafer fabs worldwide is an effective way to forecast where the global semiconductor industry is heading. As you can see from the chart below, the past four years have been particularly disruptive with multiple unforeseen world events, causing companies to adapt their long-term manufacturing strategies. This article focuses on what we see as the top three challenges semiconductor manufacturing will face in 2024 – geopolitical uncertainty, technological shifts, and capacity sourcing.

Challenge #1 – Geopolitical uncertainty

The U.S. decision to impose export controls on China’s advanced #chip access in 2022 has shaken up the global semiconductor industry in ways we have not seen since Japan’s market correction in the 90s. Reshoring and de-risking have become common terms, incentivizing the creation of new fabs as a matter of economic and national security, supported by generous government subsidies. Amidst this rapidly changing geopolitical landscape, companies are often caught off guard. We have observed that increased scrutiny over fab ownership has been one of the most prominent themes we have had to navigate in this new geopolitical landscape. In the past couple of years, we are seeing unprecedented government oversight on fabs that is having big impacts, as illustrated by ATREG, Inc.’s recent sales of the Elmos Dortmund, Germany fab to Littelfuse and the Nexperia Newport, UK fab to Vishay, both of which were producing mature 200mm technology.

Challenge #2 – Technological shifts

When it comes to chips, two technology revolutions rise above the rest in their impact on the global semiconductor industry and its future – electric vehicles (EVs) and artificial intelligence (AI). A significant number of chips going into EVs are still mature chips and EVs are driving the motivation to consider internalizing production and buying wafer fabs, particularly at 200mm. AI demands much more advanced chipmaking and this is a driver of significant greenfield investment.

Semiconductor companies that have placed their bets to get ahead of rising EV demand have by association bet on silicon carbide (SiC). Ever since Tesla announced the implementation of SiC into its EVs in 2017, the semiconductor industry has been preparing for the role of compound semiconductors to increase alongside growth in the EV market. Companies are also already making moves to prepare for accelerating galliumnitride (GaN) demand and thinking about where to implement GaN in facilities. Existing silicon fabs can be a great answer to this as they can be more easily converted for GaN production and typically need lower CapEx than converting for SiC.

Challenge #3 – Capacity sourcing

More chips will be needed to meet new demand, but where will they actually come from, and are companies thinking long term enough to avoid short-term intimidation from underutilization among their fabs? In this environment, we see companies exploring a variety of options to secure their future #capacity – greenfield, brownfield, and foundry.

Greenfield is an option, especially with current active government subsidies, but it skews towards larger companies because of the huge investment required to build and operate. According to an article published by the Boston Consulting Group (BCG) in September 2023 (Navigating the Costly Economics of Chip Making), a wafer fab completed in 2026 would carry a 10-year total cost of ownership (TCO) of $35 to $43 billion – 33% to 66% higher than today’s costs. Wolfspeed is a company that did it right back in 2019 when it started the construction of its Mohawk Valley greenfield fab to position itself to capture rising SiC demand.

Bosch decided to do it differently by capitalizing on brownfield and acquiring TSI Semiconductors’ Roseville, CA fab. The company will boost the production of 200mm SiC chips on U.S. soil by 2026 with a $1.5 billion #investment in the site. Brownfield fab demand has remained consistent throughout this downturn period. Why? Because brownfield fabs can offer existing infrastructure, equipment, intellectual property (IP), know-how, and an experienced workforce coupled with multi-year supply agreements and accelerated time-to-market. All 200mm fab transactions completed in 2023 reflected transaction values demonstrating the strategic importance of brownfield fabs in prioritizing time-to-market and acquiring know-how.

Foundries remain an important part of the global #semiconductor ecosystem, but there are serious concerns in the market about potential semiconductor factory overcapacity and excess chip inventory in the supply chain. According to DIGITIMES Research, wafer foundry services demand in 2024 is unstable and major wafer foundries have lowered their CapEx to regulate the pace of adding new production capacity. It is estimated that the combined capital expenditure of the top five wafer foundry operators will decrease by about 2% in 2024 down to $55 billion. After years of being at full capacity and having leverage over customers, foundries are responding to a new market where customers have more control. Chip demand may only stabilize in the latter half of 2024.

So what’s next for global semiconductor manufacturing? Amidst these challenges and disruptions, global wafer fab demand will continue to skyrocket, with the majority of transactions at 200mm and mature nodes remaining critical for chip makers. Manufacturing fabs are an incredibly unique asset that countries are prioritizing for national and economic security, and despite large greenfield investments and #incentives being available through global chips acts, brownfield #capacity often remains the preferred choice for chip manufacturers.

Also Read:

CEO Interview: Stephen Rothrock of ATREG

CEO Interview: Barry Paterson at Agile Analog

An open letter regarding Cyber Resilience of the UK’s Critical National Infrastructure


Podcast EP225: The Impact Semiconductor Technology is Having on the Automotive Industry with Chet Babla

Podcast EP225: The Impact Semiconductor Technology is Having on the Automotive Industry with Chet Babla
by Daniel Nenni on 05-24-2024 at 10:00 am

Dan is joined by Chet Babla, indie Semiconductor’s Senior Vice President of Strategic Marketing, responsible for expanding the company’s tier 1 and automotive OEM customer base, as well as supporting product roadmap development. Chet has worked in the technology industry for over 25 years in a variety of technical and commercial roles, starting his career as an analog chip designer. He most recently served as Vice President of Arm’s Automotive Line of Business where he led a team focused on delivering the processing technology required for automotive applications including powertrain, digital cockpit, ADAS and autonomous driving. Prior to Arm, Chet has held multiple senior roles in the semiconductor industry and has also advised the UK government on its ICT trade and investment strategy.

Dan explores the impact semiconductors are having on the automotive industry with Chet. Three megatrends are discussed – driver safety and automation, in-cabin user experience, and electrification. Chet describes the significant advances that are being made in all these areas and details some of the innovation Indie Semiconductor is bringing to the market.

Dan also discusses the potential timeline for deployment of fully autonomous vehicles with Chet and the hurdles that must be addressed.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.