Banner 800x100 0810

Electrical Rule Checking and Exhaustive Classification of Errors

Electrical Rule Checking and Exhaustive Classification of Errors
by Daniel Payne on 04-16-2024 at 10:00 am

Aniah tool flow min

The goal of SoC design teams is to tape-out their project and receive working silicon on the first try, without discovering any bugs in silicon. To achieve this lofty goal requires all types of specialized checking and verification during the design phase to prevent bugs. There are checks at the system level, RTL level, gate level, transistor level and physical layout levels. One newer EDA company is Aniah, and their focus is on checking the correctness of IC designs at the transistor level through Electrical Rule Checking (ERC) by employing formal methods and smart clustering of errors

During ERC a formal tool can mistakenly report “false positives”, and these are false errors that shouldn’t have been reported. Real design errors that are not detected are called “false negatives”, so the ideal formal tool has zero false negatives. The Aniah formal ERC tool is called OneCheck, and I’ve just read their White Paper to get up to speed on how it works.

The Aniah OneCheck ERC can be run on a design in several places for IC flows to verify both analog and digital circuitry:

Aniah Tool Flow

Some common design flaws caught by formal checkers include:

  • Missing Level Shifters
  • Floating Gates
  • High Impedance states
  • Floating Bulk
  • Diode Leakage
  • Electrical Overstress
False Errors

There are four typical classes of false errors that an ERC tool can be fooled by, so the following examples illustrate the challenges.

1. Topology Specific

The following circuit has two power domains – VDD, Vin; a level shifter is expected between them, and here the false error flags transistors M2 and M3, because their gates are connected to net A and Net 1 which are powered by Vin, not VDD. Transistors M0 and M1 actually control the “1” level.

False Error: Missing Level Shifter

2. Analog Path

A differential amplifier has devices M1 and M2 that are biased to act as an amplifier with current provided by M3, yet a false error reports an analog path issue.

False Error – analog path

3. Impossible Path Logically

An inverter of M1, M2 is driven by a lower range signal. When net 3 is ‘1’, then M2 pulls down output net 2 to a ‘0’, but the false error reports a logic path through M3 and M1.

False Error – Impossible path

4. Missing supply in setup

When a ring oscillator circuit requires a regulated supply value of 1.2V, but the regulator has a supply value of 2.5V, then a false error can be reported for electrical overstress.

False Error – Missing supply in setup

OneCheck

The good news is that OneCheck from Aniah has a smart Clustering Root-Cause analysis methodology to handle these four types of false errors. This formal circuit checker doesn’t use any vectors because all circuit states are verified in just one run, which includes verification of all power states of each circuit. Commercial circuits on mature or latest generation nodes have been run through OneCheck, so it’s a reliable tool.

Your circuit design team can start using OneCheck after the first schematic netlists are entered, even before any simulations have been run. The actual run times of OneCheck are quite fast, typically just a few seconds on a mixed-signal designs with over 10 million transistors and more than 10,000 different power scenarios.

1. Topology Specific
OneCheck detects topology-related false errors like missing level shifters by performing pseudo-electrical analysis to model voltages and currents.

2. Analog Path
With Aniah OneCheck a user can identify and filter false errors with any current or voltage reference net.

3. Impossible path logically
The OneCheck tool finds all tree-like paths used by analog multiplexors, and the user can reject thousands of false errors quickly.

4. Missing supply in setup
All errors corresponding to a missing supply are clustered together, so users can easily update the power supply setup.

Summary

Finding circuit bugs before manufacturing is the preferred method to ensure first silicon success, so ERC is another tool for chip design teams to use. Other ERC tools report way too many false errors, so that his limited their acceptance in the design community. Aniah has delivered new formal technology to combat this issue of false errors for ERC.

Why not give OneCheck a try on some of your biggest IC designs, as the evaluation process is free and easy.

Read the full 11-page White Paper from Aniah online.

Related Blogs

 


Early SoC Dynamic Power Analysis Needs Hardware Emulation

Early SoC Dynamic Power Analysis Needs Hardware Emulation
by Lauro Rizzatti on 04-16-2024 at 6:00 am

Early SoC Dynamic Power Analysis Figure 1
The relentless pursuit for maximizing performance in semiconductor development is now matched by the crucial need to minimize energy consumption.

Traditional simulation-based power analysis methods face insurmountable challenges to accurately capture complex designs activities in real-world scenarios. As the scale of modern SoC designs explodes, a new pre-silicon dynamic power analysis methodology is essential. This approach should center on executing representative real-world software workloads.

Power Consumption: Static vs Dynamic Power Analysis

Two primary factors contribute to energy dissipation in semiconductors: static power consumption and dynamic power dissipation. While both are grounded in physics concerning dimensions, voltages, currents, and parasitic elements (resistance and capacitance, or RC), static power consumption remains largely unaffected by the type and duration of the software workload, except for power management firmware that shuts down power islands. Conversely, dynamic power dissipation is heavily dependent on these workload attributes.

Understanding that the dynamic power dissipated by a circuit scales with the logical transitions occurring during its operation, it becomes crucial to accurately capture its switching activity in order to achieve precise power analysis and optimize power dissipation for a design.

Average And Peak Power Analysis

Recording the switching activity as toggle count data, without correlating it with corresponding time intervals, restricts the analysis to average power consumption over the operational time window. Typically, the switching data is cumulatively recorded throughout an entire run in a file format called the switching activity interchange format (SAIF). The size of the SAIF file remains constant irrespective of the duration of the run but grows with the design complexity (i.e. the number of nets in the design).

Capturing time-based and cycle-by-cycle information, namely, full activity waveforms, allows for calculating power consumption as a function of time during device operation. Signal transitions along their associated timestamps are typically recorded for the entire run in a signal database, traditionally stored in the industry standard FSDB (Fast Signal DataBase) format. Today this format is no longer adequate due to the considerable size of the switching file, which escalates with longer runs, potentially reaching terabytes for extended runs spanning billions of cycles. More efficient methods utilize the native output format directly provided by the emulator.

Accurate Power Analysis: Design Hierarchy Dependency

The accuracy of the switching activity is contingent upon the level of design details accessible during the recording session. As the design description evolves from high level of abstraction in the early stages of the development to the Register Transfer level (RTL), gate level and, eventually, down to the transistor level, increasingly detailed design information becomes accessible.

The accuracy of power estimation varies across different levels of abstraction in semiconductor design. At the transistor level, the accuracy is typically within 1% of the actual power dissipation of the silicon chip. This decreases to approximately 2 to 5% at the gate level, around 15 to 20% at the RTL (Register Transfer Level), and ranges from 20% to 30% at the architectural level. However, higher levels of abstraction offer faster turnaround time (TAT) and empower designers to make influential decisions that affect power consumption.

The accuracy vs. TAT tradeoff poses a challenge to designers., At the architectural level, designers enjoy the greatest flexibility to compare multiple architectures, explore various design scenarios, perform power trade-offs, and achieve optimal power optimizations. Instead, at the gate level where accuracy is higher, there is limited flexibility for significant optimizations beyond marginal improvement. The RTL strikes the optimal compromise, providing sufficient details for accurate power consumption analysis while retaining enough flexibility for substantial power optimizations. Moreover, it’s at the RTL where software and hardware converge in the design flow for the first time, enabling engineers to explore optimizations in both domains. Software drivers, in particular, can profoundly impact the power characteristics of the overall design.

Accurate Power Analysis: Design Activity Dependency

Dynamic power consumption depends heavily on the design activity, which can be stimulated using various techniques. These may include external stimuli applied to its primary inputs or the execution of software workloads by embedded processors within the device under test (DUT). Software workloads encompass booting an operating system, executing drivers, running entire applications such as computationally intensive industry benchmarks, and performing tests/diagnostics.

According to Tom’s Hardware, the improvements to idle power usage on Radeon RX 7800 XT and 7700 XT GPUs are massive – with the 7800 XT dropping from 33W to 12.9W and the 7700 XT dropping from 27.5W to 12W.[1]

Stimulus in the form of synthetic tests as used in functional verification testbenches fail to exercise the design to the extent necessary to toggle most of its fabric. This level of activation can only be achieved through the execution of realistic workloads.

Meeting Dynamic Power Analysis Challenges with Hardware Emulation

Verification engines such as software simulators, while effective for recording switching activity, are limited by execution speed, greatly dependent on design size and stimulus duration. Attempting to boot Android OS via an HDL simulator may take years, rendering it unfeasible.

To overcome these limitations and still capture detailed toggle data, hardware emulators emerge as the superior choice. They can complete such demanding tasks within a reasonable timeframe.

Hardware emulators operate at six or more orders of magnitude faster than logic simulators. However, executing even a few seconds of real-time operations on an emulated design can amount to billions of cycles, taking several hours at emulation speed of few megahertz.

Rather than relying solely on sheer computational power, adopting a divide and conquer approach proves to be more effective and efficient. The primary objective remains ensuring that both the average and peak power consumption levels adhere to the specified power budget outlined in the design requirements. In the event of a breach of the power budget, it is essential to swiftly and easily identify the underlying cause.

Performing Power Analysis with a Three-Step Methodology

A best-in-class hardware emulator can accomplish the task in three steps. See figure 1.

Figure 1: Finding Power Issues in Billion Cycles Workloads (Source: Synopsys)
Step One

In step one, a power model based on the RTL design is generated and executed on the emulator for the entire run of multi-billion cycles. The emulator conducts activity-based calculations and produces a weighted activity profile (WAP), i.e., a time-based graph that is a proxy for power. See example in figure 2.

Figure 2: Weighted-activity profile showing a power bug. (Source: Synopsys)

By visually inspecting the WAP, users can identify areas of interest for analysis, pinpointing time windows of few million cycles with exceedingly high activity, which may indicate opportunities for optimization or reveal potential power bugs.

Step Two

In step two, the emulator runs through that time window of few million cycles and genrates a signal activity database. Subsequently, a special-purpose massively parallel power analysis engine is used to compute power and generate the power waveform. Worth mentioning, a “save&restore” capability may accelerate the process by resuming from the closest checkpoint to the time window under investigation. In this step, a fast power calculation engine is required to achieve turn-around times of less than a day for tens of millions of cycles. Its accuracy should falls in the range of 3% to 5% of power signoff analysis to facilitate informed decision-making regarding actual power issues. Additionally, a secondary inspection of the power profile graph within the few million cycles time window aids users to pinpoint a narrower time window of few thousands cycles around the power issue.

Step Three

In the final step 3, the emulator processes the narrower time window of few thousands cycles and generates an FSDB waveform database to be fed to into a power sign-off tool to output highly accurate average and peak power data.

In each successive step, users progressively zoom in by approximately a factor of a thousand, narrowing down from billions to millions, and finally down to thousands of cycle ranges.

The three-step process allows for the discovery of elusive power issues, akin to finding the proverbial needle in the haystack.

Taking it further: Power Regression

The fast execution speed of leading-edge hardware emulators and massively parallel power analysis engines enable efficient power regression testing with real-world workloads. This capability greatly enhances pre-silicon verification/validation by promptly identifying and removing power-related issues before they manifest in silicon.

Typically, each new netlist release of a DUT can undergo rapid assessment to certify compliance with power budgets. Running power regressions on a regular basis ensure consistent achievement of power targets.

Viewing inside: Virtual Power Scope

Performing post-silicon power testing on a lab testbench presents challenges because of limited visibility into the design. Despite operating at gigahertz speeds, test equipment typically samples power data at a much lower rate, often in the kilohertz range. This results in sparse power measurements, capturing only one power value per million cycles. Moreover, unless the chip was specifically designed with separate supply pins per block, obtaining block-by-block power data via silicon measurements proves exceedingly difficult. Frequently, only a chip-level power trace is available.

Pre-silicon power validation conducted through hardware emulation and massively parallel power analysis acts as a virtual power scope. It enables tracing and measurement of power throughout the design hierarchy, ensuring adherence to target specifications. This analysis can delve down to the individual cell level, accurately evaluating the power consumption of each block and component within the design. Essentially, it functions akin to a silicon scope, providing insight into the distribution of power within the chip.

Expanding beyond lab analysis: IR Drop Testing

The ability to compute power on a per-cycle basis makes it possible to detect narrow windows, spanning 10 or 20 cycles, where sudden power spikes may occur. Such occurrences often elude detection in a lab environment.

These intervals can undergo analysis using IR (where I is current and R is resistance) drop tools. These tools assess IR drop across the entire SoC within a range typically spanning 10 to 50 cycles of switching activity data.

Achieving optimization sooner, with greater precision: SW Optimization

By aligning the software view of the code running on a processor core with a power graph, it becomes feasible to debug hardware and software concurrently using waveforms.

The connection between these tools is the C debugger operating on a post-emulation trace against a set of waveform dumps. Although these waveform dumps are generated by the emulator, they can encompass various types of waveforms, including those related to power.

Conclusion

Accurately analyzing dynamic power consumption in modern SoC chips at every development stage is crucial. This proactive approach ensures adherence to the power consumption standards of the intended target device, thereby averting costly re-spins.

To achieve realistic results and avoid potential power issues, the DUT, potentially encompassing billions of gates, must undergo testing with real-world software workloads that require billions of cycles. This formidable task is achievable solely through hardware emulation and massively parallel power analysis.

SIDEBAR

The methodology presented in this article has been successfully deployed by SiMa.ai, an IDC innovation startup for AI/ML at the edge. SiMa.ai used the Synopsys’ ZeBu emulation and ZeBu Empower power analysis solution.

Lauro Rizzatti has over three decades of experience within the Electronic Design Automation (EDA) and Automatic Test Equipment (ATE) industries on a global scale. His roles encompass product marketing, technical marketing, and engineering, including management positions. Presently, Rizzatti serves as a hardware-assisted verification (HAV) consultant. Rizzatti has published numerous articles and technical papers in industry publications. He holds a doctorate in Electronic Engineering from the Universita` degli Studi di Trieste in Italy.

[1] AMD’s latest GPU driver updates the UI for HYPR-RX and the new power-saving HYPR-RX Eco (tweaktown.com))

Also Read:

Synopsys Design IP for Modern SoCs and Multi-Die Systems

Synopsys Presents AI-Fueled Innovation at SNUG 2024

Scaling Data Center Infrastructure for the Terabit Era


Semidynamics Shakes Up Embedded World 2024 with All-In-One AI IP to Power Nextgen AI Chips

Semidynamics Shakes Up Embedded World 2024 with All-In-One AI IP to Power Nextgen AI Chips
by Mike Gianfagna on 04-15-2024 at 10:00 am

Semidynamics Shakes Up Embedded World 2024 with All In One AI IP to Power Nextgen AI Chips

Semidynamics takes a non-traditional approach to design enablement. Not long ago, the company’s Founder and CEO, Roger Espasa unveiled extreme customization at the RISC-V Summit. That announcement focused on a RISC-V Tensor Unit designed for ultra-fast AI solutions. Recently, at Embedded World 2024 the company took this strategy a step further with an All-In-One AI IP processing element. Let’s look at the challenges addressed by this new IP to understand how Semidynamics shakes up Embedded World 2024 with All-In-One AI IP to power nextgen AI chips.

The Problem

The current approach to AI chip design is to integrate separate IP blocks next to the system CPU to handle the ever-increasing demands of AI. As data volume and processing demands of AI increase, more individual functional blocks are integrated. The CPU distributes dedicated partial workloads to gpGPUs (general purpose Graphical Processor Units) and NPUs (Neural Processor Units). It also manages the communication between these units.

Moving data between the blocks this way causes high latency. Programming is also challenging since there are three different types of IP blocks with different instruction sets and tool chains. It is also worth noting that fixed-function NPU blocks can become obsolete quickly due to constant changes in AI algorithms. Software evolves faster than hardware.

The figure below illustrates what a typical AI-focused SoC looks like today.

Typical AI Focused SoC today

The Semidynamics Solution

Semidynamics has taken a completely different approach to AI chip design. The company has combined four of its IPs together to form one, fully integrated solution dubbed the All-In-One AI IP processing element. The approach delivers a fully customizable RISC-V 64-bit core, Vector Units (as the gpGPUs), and a Tensor Units (as the NPUs).  Semidynamics Gazzillion® technology ensures huge amounts of data can be handled without the issues of cache misses. You can learn more about Gazillion here.

This approach delivers one IP supplier, one RISC-V instruction set and one tool chain making implementation easier and faster with lower risk. The approach is scalable, allowing as many new processing elements as required to be integrated on a single chip. The result is easier access to next generation, ultra-powerful AI chips.

The figure below illustrates this new approach of fusing CPU, gpGPU, and NPU.

Fusing CPU, gpGPU, and NPU

This approach goes well beyond what was announced at the RISC-V Summit. A powerful 64-bit out-of-order based RISC-V CPU is combined with a 64-bit in-order based RISC-V CPU, a vector unit and a tensor unit. This delivers powerful AI capable compute building blocks. Hypervisor support is enabled for containerization and crypto is enabled for security and privacy. And  Gazzillion technology efficiently manages large date sets

The result is a system that is easy to program with high-performance for parallel codes and zero communication latency.

The technology is available today with a straight-forward business model as shown below.

Flexible and Customizable Business Model

Comments from the CEO

Roger Espasa

 Recently, I was able to get a few questions answered by Roger Espasa, the founder and CEO of Semidynamics.

Q: It seems like integration is the innovation here. If it’s easy, why has it not been done before?

A:  It is a paradigm change – the starting RISC-V momentum was focussed solely on CPU, both in the RISC-V community and with the customers.  We have seen vector benefits way earlier than others and AI very recently demands more flexible response to things like transformers and LLMs.  In fact, it’s far from easy. That’s why it’s not been done before. Especially as there was no consistent instruction set in one environment until CPU+Vector and the Semidynamics Tensor from our prior announcement.

Q: What were the key innovations you needed to achieve to make this happen?

A: I’ll start with eliminating the horribly-difficult-to-program DMAs typical of other NPU solutions and substituting their function by normal loads and stores inside a RISC-V core that get the same sustained performance – actually better. That particular capability is only available in Semidynamic’s RISC-V cores with Gazzillion technology. Instead of a nasty DMA, with our solution the software only needs to do regular RISC-V instructions for moving data (vector loads and stores, to be precise) into the tensor unit.

Also, connecting the tensor unit to the existing vector unit, where the vector register storage is used to hold tensor data. This reduces area and data duplication, enables a lower power implementation, and, again, makes the solution easier to be programmed. Now, firing the tensor unit is very simple: instead of a complicated sequence of AXI commands, it’s just a vanilla RISC-V instruction (called vmxmacc, short for “matrix-multiply-accumulate“). Adding to this, AXI commands mean that the CPU has to read the NPU data and either slowly process it by itself or send it over AXI to, for example, a gpGPU to continue calculations there.

And adding specific vector load instructions that are well suited to the type of “tiled” data used in AI convolutions and can take advantage of our underlying Gazzillion technology.

I should mention that this result can only be done by an IP provider that happens to have (1) a high-bandwidth RISC-V core, (2) a very good vector unit and (3) a tensor unit and can propose new instructions to tie all three solutions together. And that IP provider is Semidynamics!

The resulting vision is a “unified compute element” that:

1) Can be scaled up by simple replication to reach the customer TOPS target – very much like multi cores are built now. I will offer an interesting observation here: nobody seems to have a concern to have a multicore system where each core is an FPU, but once there is more than one FPU, i.e. a Vector unit, nobody understands it anymore!

2) Keeps a good balance between “control” (the core), “activation performance” (the vector unit) and “convolution performance” (the tensor unit) as the system scales.

3) Is future proofed. By having a completely programmable vector unit within the solution, the customer gets a future-proofed IP. No matter what type of AI gets invented in the near future, the combination of the core+vector+tensor is guaranteed to be able to run it.

Q: What were the key challenges to get to this level of integration?

A: Two come to mind: (1) inventing the right instructions that are simple enough to be integrated into a RISC-V core and, yet provide sufficient performance, and (2) designing a tensor unit that works hand-in-hand with the vector unit. There are many more technical and architectural challenges we solved as well.

To recap: the challenge is that we change the paradigm: we do a modern AI solution that is future proof and based on an open source ISA.

To Learn More

The full text of the Semidynamics announcement can be found here.  You can learn more about the Semidynamics Configurator here. And that’s how Semidynamics shakes up Embedded World 2024 with All-In-One AI IP to power nextgen AI chips.


Managing Power at Datacenter Scale

Managing Power at Datacenter Scale
by Bernard Murphy on 04-15-2024 at 6:00 am

Managing Power at Datacenter Scale

That datacenters are power hogs is not news, especially now AI is further aggravating this challenge. I found a recent proteanTecs-hosted panel on power challenges in datacenter infrastructure quite educational both in quantifying the scale of the problem and in understanding what steps are being taken to slow growth in power consumption. Panelists included Shesha Krishnapur (Intel fellow and IT CTO), Artour Levin (VP, AI silicon engineering at Microsoft). Eddie Ramirez (Arm VP for Go-to-Market in the infrastructure line of business), and Evelyn Landman (Co-founder and CTO at proteanTecs). Mark Potter (VC and previously CTO and Director of HP Labs) moderated. This is an expert group directly responsible for or closely partnered with some of the largest datacenters in the world. What follows is a condensation of key points from all speakers.

Understanding the scale and growth trends

In 2022 US datacenters accounted for 3.5% of total energy consumption in the country. Intel sees 20% compute growth year over year which through improved designs and process technologies is translating into a 10% year over year growth in power consumption.

But that’s for CPU-based workloads. Sasha expects demand from AI-based workloads will grow at twice that rate. One view is that a typical AI-accelerated server is drawing 4X the power of a conventional server. A telling example suggests that AI-based image generation consumes almost 10X the power of just trying to find images online. Not an apples and apples comparison of course but if the AI option is easier and produces more intriguing results, are end-users going to worry about power? AI has the potential to turn an already serious power consumption problem into a crisis.

For cooling/thermal management the default today is still forced air cooling, itself a significant contributor to power consumption. There could be better options but re-engineering existing infrastructure for options like liquid/immersion cooling is a big investment for a large datacenter; changes will move slowly.

Getting back onto a sustainable path

Clearly this trend is not sustainable. There was consensus among panelists that there isn’t a silver bullet fix and that datacenter power usage effectiveness (PUE) must be optimized system-wide through an accumulation of individually small refinements, together adding up to major improvements.

Shesha provided an immediate and intriguing example of improvements he has been driving for years in Intel datacenters worldwide. The default approach, based on mainframe expectations, had required cooling to 64-68oF to maximize performance and reliability. Research from around 2010 suggested improvements in IT infrastructure would allow 78oF as a workable operating temperature. Since then the limit has been pushed up higher still, so that PUEs have dropped from 1.7/1.8 to 1.06 (at which level almost all the power entering the datacenter is used by the IT equipment rather than big cooling systems).

In semiconductor design everyone stressed that power optimization will need to be squeezed through an accumulation of many small improvements. For AI, datacenter inference usage is expected to dominate training usage if AI monetization is going to work. (Side note: this has nothing to do with edge-based inference. Business applications at minimum are likely to remain cloud based.) One way to reduce power in inference is through low-precision models. I wouldn’t be surprised to see other edge AI power optimizations such as sparse matrix handling making their way into datacenters.

Conversely AI can learn to optimize resource allocation and load balancing for varying workloads to reduce net power consumption. Aligning compute and data locations and packing workloads more effectively across servers will allow for more inactive servers which can be powered down at any given time.

Naturally Eddie promoted performance/watt for scale-out workloads; Arm have been very successful in recognizing that one size does not fit all in general-purpose datacenters. Servers designed for high performance compute must coexist with servers for high traffic tasks like video-serving and network/storage traffic optimization. Each tuned for different performance/watt profiles.

Meanwhile immersion and other forms of liquid cooling, once limited to supercomputer systems, are now finding their way into regular datacenters. These methods don’t reduce IT systems power consumption, but they are believed to be more power-efficient in removing heat than traditional cooling methods, allowing for either partial or complete replacement of forced air systems over time.

Further opportunities for optimization

First, a reminder of why proteanTecs  is involved in this discussion. They are a very interesting organization providing monitor/control “agent” IPs which can be embedded  in a semiconductor design. In mission mode these can be used to supply in-field analytics and actionable insights on performance, power and reliability. Customers can for example use these agents to adaptively optimize voltages for power reduction while not compromising reliability. proteanTecs claim demonstrated 5% to 12% power savings across different applications when using this technology.

Evelyn stressed that such approaches are not only a chip level techonology. The information provided must be processed in datacenter software stacks so that workload optimization solutions can take account of on-chip metrics in balancing between resources and systems. Eddie echoed this point in adding that the more information you have and the more telemetry you can provide the software stack, the better the stack can exploit AI-based power management.

Multi-die systems are another way to reduce power since they bring otherwise separate components closer together, avoiding power-hungry communication through board traces and device pins.

Takeaways

For semiconductor design teams, expect power envelopes to be squeezed more tightly. Since thermal mitigation requirements are closely coupled to power, expect even more work to reduce hotspots. Also expect to add telemetry to hardware and firmware to guide adaptive power adjustments. Anything that affects service level expectations and cooling costs will go under the microscope. Designers may also be borrowing more power reducing design techniques from the edge. AI design teams will be squeezed extra hard 😀 Also expect a bigger emphasis on chiplet-based design.

In software stacks, power management is likely to become more sophisticated for adaptation to changing workloads in resource assignments and power down for systems not currently active.

In racks and the datacenter at large, expect more in-rack or on-chip liquid-based cooling, changing thermal management design and analysis at the package, board and rack level.

Lots to do! You can learn more HERE.

Also Read:

proteanTecs Addresses Growing Power Consumption Challenge with New Power Reduction Solution

Fail-Safe Electronics For Automotive

Building Reliability into Advanced Automotive Electronics

 


EP217: The Impact and Unique Business Model of Silicon Creations with Randy Caplan

EP217: The Impact and Unique Business Model of Silicon Creations with Randy Caplan
by Daniel Nenni on 04-12-2024 at 10:00 am

Dan is joined by Randy Caplan, co-founder and CEO of Silicon Creations, and a lifelong technology enthusiast. For almost two decades, he has helped grow Silicon Creations into a leading mixed-signal semiconductor IP company with nearly 500 customers spanning almost every major market segment.

Randy provides some background on Silicon Creations unique bootstrapped business model. Today, the company provides critical analog/mixed signal IP to many customers across a wide variety of markets. Silicon Creations has delivered IP in apprixantely 85 process nodes.

Randy explores how the company has succeeded and assesses what its impact will be in the future.

The views, thoughts, and opinions expressed in these podcasts belong solely to the speaker, and not to the speaker’s employer, organization, committee or any other group or individual.


Silicon Catalyst partners with Arm to launch the Arm Flexible Access for Startups Contest!

Silicon Catalyst partners with Arm to launch the Arm Flexible Access for Startups Contest!
by Daniel Nenni on 04-12-2024 at 6:00 am

ARM SI Contest

Winner and Runner-up to receive the contest’s largest ever technology credit for production tape-outs.

This is an example of why I enjoy working with Silicon Catalyst. They collaborate with our partners and do some really impressive things, all for the greater good of the semiconductor industry, absolutely. If you are not currently engaged with the Silicon Catalyst ecosystem you need to be.

With the overwhelming success of last year’s contest which resulted in $150,000 in Arm technology credit awarded to the winner, this year the bounty has been increased to $250,000 to the top startup.

The 2024 Arm Flexible Access for Startups Contest is open to privately owned startup companies in pre-seed, seed and Series A funding that have raised a maximum of $20 million in funding. The applicant companies need to either be using Arm or considering using Arm in their products. An Arm technology credit of $250,000 and $150,000, will be awarded to the winner and runner-up, respectively, and can be used towards a commercial tape-out and could cover IP fees for a complete embedded system or contribute to the cost of a higher performance system. Both the winner and runner-up will also receive additional benefits, including a pitch review session hosted by the Silicon Catalyst Angels investment group. All contest applicant organizations will also be considered for acceptance to the Silicon Catalyst Incubator/Accelerator.

Last year’s winner Equal1 is a pioneering silicon quantum computing company dedicated to making quantum computing affordable and accessible:

“We are thrilled to be announced as the winner of the 2023 ‘Silicon Startups Contest’. Arm’s support, partnership, and technology credit are invaluable to the development of our QSoC processors. Just as the evolution of classical computers was driven by advancements in silicon processors, we firmly believe quantum computing will follow the same silicon path. Like the majority of chips today, the new era of quantum computing will be powered by Arm, with a focus on power efficiency, performance, proven reliability, and a robust ecosystem.”

– Jason Lynch, CEO, Equal1 Labs

This year, the overall winner receives $250,000 Arm technology credit toward an Arm Flexible Access commercial tape-out. The runner-up receives $150,000 Arm Technology Credit towards an Arm Flexible Access commercial tape-out. The winner and runner-up will also receive:

A free Arm Design Review to enable Arm to review the customer’s design specification, Ticket to Arm’s invite-only ecosystem event for networking and a chance to be featured, and a pitch review session hosted by the Silicon Catalyst Angel investment group.

Additionally, Paul Williamson, Senior Vice President and General Manager, IoT Line of Business at Arm said:

“Arm technology is for everyone, and through this contest, we are recognizing and supporting the next wave of innovators to grow their business and accelerate their SoC designs. We know that time to product and access to the largest possible market are critical for startups, which is why we created Arm Flexible Access for Startups, providing $0 access to a wide portfolio of IP, tools and support, to maximize their chance of success.”

If you remember, we wrote the definitive book on Arm “Mobile Unleashed: The Origin and Evolution of the Arm Processor in our Devices” and we have written hundreds of related Arm articles. This contest is an incredible opportunity to work closely with the #1 processor IP company and the world’s only incubator focused exclusively on accelerating semiconductor solutions.

Also Read:

CEO Interview: Patrick T. Bowen of Neurophos

A Candid Chat with Sean Redmond About ChipStart in the UK

CEO Interview: Jay Dawani of Lemurian Labs

Seven Silicon Catalyst Companies to Exhibit at CES, the Most Powerful Tech Event in the World


Synopsys Design IP for Modern SoCs and Multi-Die Systems

Synopsys Design IP for Modern SoCs and Multi-Die Systems
by Kalar Rajendiran on 04-11-2024 at 10:00 am

Synopsys IP Scale, a Sustainable Advantage

Semiconductor intellectual property (IP) plays a critical role in modern system-on-chip (SoC) designs. That’s not surprising given that modern SoCs are highly complex designs that leverage already proven building blocks such as processors, interfaces, foundational IP, on-chip bus fabrics, security IP, and others. This is reflected by a flourishing third-party IP market segment that reached $7.05B in 2023 [Source: IP Nest Reports].

With ~$1.54B of Design IP revenue in 2023, Synopsys holds the #2 position in the third-party IP market segment worldwide and is the leader in interface IP and foundation IP. The company did not get to this position overnight. Synopsys has taken a deliberate and strategic approach to building its IP business over time. Over a course of 25 years, Synopsys has diligently cultivated the world’s broadest IP portfolio spanning building blocks/peripherals, interfaces, foundation IP (standard cells, memories), processors, security, AI accelerators (NPUs, DSP), sensors and more. It is interesting to note that while the third-party IP market grew a little over 6% between 2022 and 2023, Synopsys’ Design IP business grew at about 18%. The company reaffirmed and recommitted to a sustainable mid-teens growth rate for their Design IP business.

Customer-Centric Approach

At the heart of Synopsys’ success lies its unwavering commitment to customer satisfaction. Through unparalleled IP quality, exceptional support, and a reputation for reliability, Synopsys has earned the trust of semiconductor as well as systems companies worldwide. Testimonials from industry partners and customers underscore Synopsys’ reputation as the preferred choice for semiconductor IP solutions.

The following chart shows the results from a blind survey by an independent company.

Synopsys continues to reaffirm its commitment to excellence by prioritizing quality, innovation, and customer support. The company continues to demonstrate its investment commitment by adding both organically developed IP and acquired IP to its portfolio. A couple of recent examples are Synopsys’ Universal Chiplet Interconnect Express (UCIe) IP and its Physical Unclonable Function (PUF) IP through acquisition of Intrinsic ID. This kind of strategic expansion continues to position Synopsys as a trusted partner for semiconductor designs, empowering customers to realize their design goals with confidence.

UCIe IP for Heterogeneous Interoperability of Multi-Die Systems

With the rise of heterogeneous computing architectures and the proliferation of AI and machine learning workloads, designers must increasingly consider both silicon-level and system-level optimizations when designing their products. Multi-die systems are key to the next wave of systems innovations and enable the integration of heterogeneous dies in a single package. The Universal Chiplet Interconnect Express (UCIe) standard was introduced in 2022 to address this heterogeneous die-to-die interoperability need. By standardizing communication between chiplets, UCIe not only simplifies the integration process but also fosters a broader ecosystem where chiplets from different vendors can seamlessly be incorporated into a single design.

One of the things Synopsys’ CEO Sassine Ghazi emphasized during his keynote talk at the Synopsys User Group (SNUG) conference is the importance of multi-die solutions. He spotlighted Intel’s Pike Creek, the world’s first UCIe-enabled silicon, a result of collaboration between Intel, TSMC and Synopsys.

As an auxiliary point, with the evolution to heterogenous SoCs, Synopsys’ EDA tools are tightly integrated with its IP portfolio, allowing for seamless interoperability and faster time-to-market.

PUF IP for Security

Given the increasing sophistication of cyber threats these days, the integrity and security of semiconductor designs are of paramount importance. With the proliferation of connected devices, ensuring the confidentiality and integrity of sensitive data has become increasingly crucial for semiconductor manufacturers and system integrators alike.

Synopsys recently completed the acquisition of Intrinsic ID, a pioneer in PUF IP technology. PUF technology harnesses the inherent variations in silicon chips to generate unique identifiers, offering robust protection against a range of security threats including counterfeiting, tampering, and unauthorized access. By integrating Intrinsic ID’s PUF IP into its portfolio, Synopsys empowers chip designers to embed security features directly into their designs, expediting time-to-market and reducing costs. The acquisition not only expands Synopsys’ IP offerings but also enriches its talent pool with a team of experienced R&D engineers deeply knowledgeable in PUF technology. Synopsys intends to leverage Intrinsic ID’s presence in the Netherlands to establish a center of excellence for PUF technology in Eindhoven, enhancing its research and development capabilities in the critical area of security IP.

Summary

As technology continues to advance and new challenges emerge, Synopsys remains committed to delivering best-in-class solutions and driving the industry forward. With dedication to customer satisfaction and a sustainable advantage, Synopsys is positioned to lead the way in semiconductor IP for years to come. Its drive for innovation, and customer-centricity ensures its place as a trusted partner for semiconductor and systems companies worldwide.

Also Read:

Synopsys Presents AI-Fueled Innovation at SNUG 2024

Scaling Data Center Infrastructure for the Terabit Era

TSMC and Synopsys Bring Breakthrough NVIDIA Computational Lithography Platform to Production


Enhancing the RISC-V Ecosystem with S2C Prototyping Solution

Enhancing the RISC-V Ecosystem with S2C Prototyping Solution
by Daniel Nenni on 04-11-2024 at 6:00 am

ChipLink

RISC-V’s popularity stems from its open-source framework, enabling customization, scalability, and mitigating vendor lock-in. Supported by a robust community, its cost-effectiveness and global adoption make it attractive for hardware innovation across industries.

Despite its popularity, evolving RISC-V architectures pose design and verification challenges. A significant concern is the potential fragmentation in RISC-V system integration. Exploring RISC-V microarchitectures may result in variants incompatible with each other. Moreover, as the RISC-V ecosystem matures, design complexity escalates, necessitating enhanced verification procedures.

S2C plays a pivotal role in the RISC-V ecosystem as a member of RISC-V International. Let’s explore how S2C aids chip designers in optimizing and differentiating their RISC-V processor-based SoCs across diverse applications.

Key Benefits of the S2C FPGA Prototyping Solution for RISC-V

S2C offers an extensive array of FPGA prototyping systems, ranging from the desktop prototyping platform Prodigy Logic System to the high-performance enterprise prototyping solution Logic Matrix, catering to the diverse needs of RISC-V System Verification or Demonstration. Multiple options are available to meet the diversity of RISC-V, regardless of the scale of the design. In addition to traditional partitioning schemes, S2C also provides ChipLink IP, which ensures high-performance AXI chip-to-chip partitioning.

Robust bring-up and debugging methods enhance user efficiency, including FPGA download via Ethernet/USB/SD card, UART/Virtual UART, Ethernet-based AXI transactor, and a custom logic analyzer for Multi-FPGA (MDM).

S2C also provides a utility to download operating systems & applications from PC to FPGA’s DDR4.

The high-bandwidth transmission enables a much faster boot-up of software, accelerating time to operation.

General Purpose Partitioning and ChipLink

S2C offers a General-Purpose TDM interconnect communication solution, which is applicable regardless of IP logic scale or bus interface type limitations. Configured as a 25Gbps Line Rate, S2C’s General-Purpose Serdes TDM IP can provide up to 20MHz of TDM partitioning for large IP design partitions. With a multiplexing ratio of up to 8K:1, it enables long-distance data communication via optical fiber cables, streamlining the networking process for large-scale SoC prototype designs with simplicity and efficiency.

ChipLink, an AXI-based partitioning solution, facilitates multi-core SoC verification. This low-latency AXI Chip to Chip IP connects RISC-V cores and peripherals across multiple FPGAs efficiently. S2C’s ChipLink AXI IP boasts high speed and low latency, supporting AXI DATA_WIDTH of up to 1024 bits. Each bank accommodates up to four sets of AXI protocols. With multiple Serdes line rates including 12.5G, 16.25G, 20.625G, and 25G, it enables communication at 100MHz between multi-core processors.

Strengthened by a Broad Prototype Tools

S2C offers a comprehensive suite of tools to facilitate and optimize RISC-V SoC design verification. Notably, Prototype Ready IP features over 90 readily deployable daughter cards, simplifying prototyping setup and significantly reducing initialization time and effort.

Additionally, S2C’s multidimensional prototyping software, Prodigy PlayerPro-RT, enables seamless FPGA/Die downloads via USB, Ethernet, and SD Card interfaces. Beyond downloads, PlayerPro-RT offers real-time hardware monitoring, remote system management, and extensive hardware self-testing functionalities, ensuring a smooth and efficient verification process.

S2C further enhances verification with the inclusion of the high-bandwidth AXI transactor, Prodigy ProtoBridge, facilitating swift and efficient data transmission between PC and FPGA prototypes at PCIe speeds of up to 4000MB/s. By offering high bandwidth and fast read/write capabilities, ProtoBridge significantly boosts design productivity.

In the competitive realm of RISC-V SoC development, differentiation is crucial. S2C Prototyping Solutions emerge as a trusted ally, offering a streamlined pathway for verification and demonstration, empowering developers to amplify the unique value propositions of their SoCs.

For more information: https://www.s2cinc.com/riscv.html

Also Read:

2024 Outlook with Toshio Nakama of S2C

Prototyping Chiplets from the Desktop!

S2C’s FPGA Prototyping Accelerates the Iteration of XiangShan RISC-V Processor


Intel is Bringing AI Everywhere

Intel is Bringing AI Everywhere
by Mike Gianfagna on 04-10-2024 at 10:00 am

Intel is Bringing AI Everywhere

On April 8 and 9 Intel held its Intel Vision event in Phoenix Arizona. This is Intel’s premier event for business and technology executive leaders to come together and learn about the latest industry trends and solutions in advancements from client, to edge, to data center and cloud. The theme of this year’s event was Bringing AI Everywhere. The event was packed with impressive information from all over the industry. Intel provided a briefing before the event that dove into some of the announcements and advances that would be presented. I will dig into what was presented in this post, along with a summary of Pat Gelsinger’s keynote at the event. The content is compelling – indeed it appears that Intel is bringing AI everywhere.  

Briefing Overview

Attending the briefing were three key members of the Intel team. Their combined experience is quite impressive. They are:

Sachin Katti, Senior Vice President & General Manager of Network and Edge Group. Prior to his current role, Sachin was CTO of the Network and Edge Group. Prior to Intel, he had a long career as an Associate Professor at Stanford University. He also founded or co-founded several companies as well. Sachin holds a Ph.D. in Computer Science from the Massachusetts Institute of Technology.

 

 

Das Kamhout, Vice President & Senior Principal Engineer in the Intel Data Center and AI Group. Das has worked at Intel for 27 years across many areas including AI, cloud, enterprise software, and storage. He has also been a Board member of the Cloud Native Computing Foundation.

 

 

Jeff McVeigh, Corporate Vice President & General Manager of Software Engineering Group. Jeff has also worked at Intel for 27 years. He has held leadership positions in the Software Engineering Group, Super Compute Group, Data Center XPU Products & Solutions, and Visual Computing Products. He holds a Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University.

 

 

The presentation began with some macro-observations. Enterprises have reached an AI inflection point, signified by swift adoption and supercharged by GenAI. Gartner estimates that 80% of enterprises will use GenAI by 2026 and at least 50% of edge computing deployments will involve machine learning. IDC expects the $40B enterprise spend on GenAI in 2024 to grow to $151B by 2027.

All this is ahead of us only if we’re able to unlock AI’s full potential. Intel reported that only 10% of organizations launched generative AI solutions to production in 2023. Furthermore, 46% of experts cited infrastructure as the biggest challenge in productionizing large language models. Barriers to adoption persist, openness and choice are limited and transparency, privacy and trust concerns are rising.

Against this backdrop Intel is making several announcements to take down the barriers to adoption, bringing AI everywhere. The five broad areas of focus were defined as follows:

  • A scalable systems strategy to address all segments of AI within the enterprise with an open ecosystem approach
  • Enterprise customer AI deployments, successes and wins
  • Open ecosystem approach to advance enterprise AI
  • Intel Gaudi® 3 AI accelerator to serve unmet demand for Generative AI solutions
  • Edge Platform and Ethernet-based networking connectivity products targeted for AI workloads

Let’s look at some of the details.

A Tour of the Announcements

Today, enterprise data and AI models live in two distinct worlds. Enterprise data is secure and confidential, rooted in specific locations, mature and predictable and has a CPU-based processing model. AI models, on the other hand are based on public data, are characterized by rapid change with varied degrees of security and have an accelerator-based processing model.

Intel aims to unlock the enterprise AI model through the power of open ecosystems. Attributes of this approach include:

  • An Application Ecosystem that is easy and open, by working with industry leaders to provide end-to-end AI enterprise solutions at scale
  • A Software Ecosystem that is secure and responsible, by driving an open software ecosystem that bridges enterprise data and AI models
  • An Infrastructure Ecosystem that is scalable and reference based, by shaping the enterprise AI infrastructure through reference architectures, together with partners
  • A Compute Ecosystem that is accessible and confidential, by building safe and AI capable compute platforms from client to data center

The diagram below is a top-level view of how these pieces fit together. Many more details of the approach were presented, along with a description of the enterprise AI software stack and planned enhancements.

Intel Enteprise AI

The presentation also discussed Intel Developer Cloud that is used by leading AI companies. Intel explained that the platform provides everything you need to build and deploy AI at scale. The diagram below shows today’s processor lineup.

The newest version of the Intel Gaudi AI accelerator brings speedups of 2X – 4X for AI compute, 2X for network bandwidth and 1.5X for memory bandwidth.  Benchmark data includes 40% faster time-to-train vs. H100 and 50% faster inferencing vs. H100. The launch partners for this accelerator are impressive, with Dell Technologies, HP Enterprise, Lenovo, and Supermicro.

The Intel Xeon6 Processor with E-cores was also discussed with a 2.4x performance per watt improvement and 2.7x performance per rack improvement. Comparing the second generation Intel Xeon processor to Xeon 6, there is over one megawatt of power reduction delivered. To put that number in perspective, it represents the energy savings of a full year’s worth of electricity use for over 1,300 homes.

The work Intel is doing with high-profile partners on confidential computing was also discussed.  A preview of work to deliver connectivity designs for AI was previewed as well. The AI PC era was also discussed. Here, Intel plans to ship 100 million AI accelerators by the end of 2025. The company’s footprint in this market is substantial.  Comprehensive strategies and platforms to support AI processing at the edge were also detailed, with 90,000+ edge deployments and 200M+ processors sold.

Pat Gelsinger’s Keynote

Pat Gelsinger

Pat was introduced as Intel’s Chief Geek. He lived up to that description with a 90-minute technology tour-de-force describing Intel’s impact, announcements, and plans. AI was front and center for most of Pat’s presentation. He described Intel Foundry as the systems foundry for the AI era and Intel products as modular platforms for the AI era. A memorable quote from Pat is “every company becomes an AI company.”

Pat then described the major re-tooling that is underway to deploy AI PCs across the entire enterprise. He discussed products that enable AI across the enterprise while reducing power and increasing efficiency. There were several impressive live demos of new technology and its impact, including an AI PC demo livestreamed from inside an Intel fab.

Pat also invited many distinguished guests to join him on stage or via the Internet to describe what their organizations are doing with Intel technology. Among those organizations were Accenture, Supermicro, Arizona State University, Bosch, Naver Corporation, and Dell Technologies (with Michael Dell).

Pat also unveiled, for the first time, the Intel Gaudi 3 AI Accelerator. This is a short summary of a great keynote presentation.

To Learn More

The pre-brief presentation and Pat Gelsinger’s keynote covered a lot of detail across an open scalable system strategy, customer/partner momentum, and next-generation products/services. You can learn more about Intel Vision 2024 here  and you can watch a replay of Pat Gelsinger’s keynote here. You will see that Intel is bringing AI everywhere.

Also Read:

Intel is Bringing AI Everywhere

Intel Direct Connect Event

ISS 2024 – Logic 2034 – Technology, Economics, and Sustainability

 


Arteris Frames Network-On-Chip Topologies in the Car

Arteris Frames Network-On-Chip Topologies in the Car
by Bernard Murphy on 04-10-2024 at 6:00 am

Automotive use case min

On the heels of Arm’s 2024 automotive update, Arteris and Arm announced an update to their partnership. This has been extended to cover the latest AMBA5 protocol for coherent operation (CHI-E) in addition to already supported options such as CHI-B, ACE and others. There are a couple of noteworthy points here. First, Arm’s new Automotive Enhanced (AE) cores upgraded protocol support from CHI-B to CHI-E and Arm/Arteris have collaborated to validate the Arteris Ncore coherent NoC generator against the CHI-E standard. Second, Arteris has also done the work to certify Ncore-generated networks with the CHI-E protocol extension for ASIL B and ASIL D. (Ncore-generated networks are already certified for earlier protocols, as are FlexNoC-generated non-coherent NoC networks.) In short, Arteris coherent and non-coherent NoC generators are already aligned against the latest Arm AE releases and ASIL safety standards. Which prompts the question: where are coherent and non-coherent NoCs required in automotive systems? Frank Schirrmeister (VP Solutions and Business Development at Arteris) helped clarify my understanding.

Automotive, datacenter/HPC system contrasts

Multi-purpose datacenters are highly optimized for task throughput per watt per $. CPU and GPU designs exploit very homogenous architectures for high levels of parallelism, connecting through coherent networks to maximize the advantages of that parallelism while ensuring that individual processors do not trip over each other on shared data. Data flows into and out of these systems through regular network connections, and power and safety are not primary concerns (though power has become more important).

Automotive systems architectures are more diverse. Most of the data comes from sensors – drivetrain monitoring and control, cameras, radars, lidars, etc. – streaming live into one or more signal processor stages, commonly implemented in DSPs or (non-AI) GPUs. Processing stages for object recognition, fusion and classification follow. These stages may be implemented through NPUs, GPUs, DSPs or CPUs. Eventually, processed data flows into central decision-making, typically a big AI system that might equally be at home in a datacenter. These long chains of processing must be distributed carefully through the car architecture to meet critical safety goals, low power goals and, of course, cost goals. As an example, it might be too slow to ship a whole frame from a camera through a busy car network to the central AI system, and then to begin to recognize an imminent collision. In such cases, initial hazard detection might happen closer to the camera, reducing what the subsystem must send to the central controller to a much smaller packet of data.

Key consequences of these requirements are that AI functions are distributed as subsystems through the car system architecture and that each subsystem is composed of a heterogenous mix of functions, CPUs, DSPs, NPUs and GPUs, among others.

Why do we need coherence?

Coherence is important whenever multiple processors are working on common data like pixels in an image, where there is opportunity for at least one processor to write to a logical address in a local cache and another processor to read from the same logical address in a different cache. The problem is that the second processor doesn’t see the update made by the first processor. This danger is unavoidable in multiprocessor systems sharing data through hierarchical memory caches.

Coherent networks were invented to ensure disciplined behavior in such cases, through behind-the-scenes checking and control between caches. A popular example can be found in coherent mesh networks common in many-core processor servers. These networks are highly optimized for regular structures, to preserve the performance advantages of using shared cache memory while avoiding coherence conflicts.

Coherence needs are not limited to mesh networks threading through arrays of homogenous processors. Most of the subsystems in a car are heterogeneous, connecting the multiple different types of functions already discussed. Some of these subsystems equally need coherence management when processing images through streaming operations. Conversely, some functions may not need that support if they can operate in separate logical memory regions, or if they do not need to operate concurrently. In these cases, non-coherent networks will meet the need.

A key consequence is that NoCs in an automotive chip must manage both coherent and non-coherent networks on a chip for optimal performance.

Six in-car NoC topologies

Frank illustrated with Arm’s use cases from their recent AE announcement, overlaid with the Arteris view of NoC topologies on those use cases (see the opening figure in this blog).

Small microcontrollers at the edge (drivetrain and window controllers for example) don’t need coherency support. Which doesn’t mean they don’t use AI – predictive maintenance support is an active trend in MCUs. But there isn’t need for high-performance data sharing. Non-coherent NoCs are ideal for these applications. Since these MCUs must sit right next to whatever they measure/control, they are located far from central or zonal controllers and are implemented as standalone (monolithic) chips.

Per Frank, zonal controllers may be non-coherent or may support some coherent interconnect, I guess reflecting differences in OEM architecture choices. Maybe streaming image processing is handled in sensor subsystems, or some processing is handled in the zonal controller. Then again, he sees vision/radar/lidar processing typically needing mostly non-coherent networks with limited coherent network requirements. While streaming architectures will often demand coherence support, any given sensor may generate only one or a few streams, needing at most a limited coherent core for initial recognition. Zonal controllers, by definition, are distributed around the car so are also monolithic chip solutions.

Moving into the car cockpit, infotainment (IVI) is likely to need more of a mix of coherent and non-coherent support, say for imaging overlaid with object recognition. These systems may be monolithic but also lend themselves to chiplet implementations. Centralized ADAS control (fusing inputs from sensors for lane recognition, collision detection, etc.) for SAE level 2+ and beyond will require more coherence support, yet still with need for significant non-coherent networks. Such systems may be monolithic today but are trending to chiplet implementations.

Finally, as I suggested earlier the central AI controller in a car is fast becoming a peer to big datacenter-like systems. Arm has already pointed to AE CSS-based Neoverse many-core platforms (2025) front-ending AI accelerators (as they already do in Grace-Hopper and I’m guessing Blackwell). Add to that big engine more specialized engines (DSPs, NPUs and other accelerators) in support of higher levels of autonomous driving, to centrally synthesize inputs from around the car and to take intelligent action on those inputs. Such a system will demand a mix of big coherent mesh networks wrapping processor arrays, a distributed coherent network to connect to some of those other accelerators and non-coherent networks to connect elsewhere. These designs are also trending to chiplet-based systems.

In summary, while there is plenty of complexity in evolving car architectures and the consequent impact on subsystem chip/chiplet designs, connected on-chip through both coherent and non-coherent networks, the intent behind these systems is quite clear. We just need to start thinking in terms of total car system architecture rather than individual chip functions.

Listen to a podcast in which Dan Nenni interviews Frank on this topic. Also read more about Arteris coherent network generation HERE  and non-coherent network generation HERE.