WP_Term Object
(
    [term_id] => 497
    [name] => Arteris
    [slug] => arteris
    [term_group] => 0
    [term_taxonomy_id] => 497
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 159
    [filter] => raw
    [cat_ID] => 497
    [category_count] => 159
    [category_description] => 
    [cat_name] => Arteris
    [category_nicename] => arteris
    [category_parent] => 178
)
            
Arteris logo bk org rgb
WP_Term Object
(
    [term_id] => 497
    [name] => Arteris
    [slug] => arteris
    [term_group] => 0
    [term_taxonomy_id] => 497
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 159
    [filter] => raw
    [cat_ID] => 497
    [category_count] => 159
    [category_description] => 
    [cat_name] => Arteris
    [category_nicename] => arteris
    [category_parent] => 178
)

The Next Hurdle AI Systems Must Clear

The Next Hurdle AI Systems Must Clear
by Bernard Murphy on 03-11-2026 at 6:00 am

Key takeaways

AI isn’t having an easy ride. The media and Wall Street swing wildly between extremes on any hint of a shift in AI sentiment. Dickens saw this coming: “It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of light, it was the season of darkness, it was the spring of hope, it was the winter of despair”. Beneath these headlines lurks an important problem for AI inference scaling: a widening gap between theoretical peak performance and what system providers can guarantee. This gap proves to have significant implications for power demand and safety.

The Next Hurdle AI Systems Must Clear

What is this gap?

Large semiconductor systems make heavy use of pre-designed subsystems, developed in-house for earlier generation products or sourced externally. This is particularly true for the chiplet-based designs now common in datacenters, also in our cars. Best-in-class chiplets are available from industry experts: CPU server subsystems, AI accelerator subsystems, and high-bandwidth memory (HBM) subsystems, while other chiplets are fashioned by the semiconductor system prime. Connections between chiplets are managed through industry-standard UCIe interfaces.

A system built on these components, each independently rated for high performance, connecting through industry-standard interfaces. Why wouldn’t this deliver near to optimum throughput?  Simple economics dictates that a big expensive semiconductor product must handle multiple inference tasks simultaneously. Individually these chiplets have been designed to do just that, but none of these has responsibility to manage traffic performance between chiplets. UCIe is designed to provide basic connectivity, not system-level traffic management. That management is the responsibility of the network subsystem between these chiplets, a system layer not unlike the internet but optimized for in-chip/in-package performance.

Multi-tenant inference platforms face unique traffic challenges. Traffic is managed through a common network for cost and power efficiency, as in any modern electronic system. However, AI traffic between CPU control, HBM and an AI accelerator is very lumpy, some bursty yet requiring high bandwidth, some very sensitive to latency, and some critical to maintaining forward progress, especially control data (valid, ready, credits, etc).

Lumpy traffic hogs bus bandwidth, not indefinitely but until a transaction is completed. The massively parallel nature of AI processing creates a second problem. A step can’t start until all the data needed for that step has arrived. Until then, the step must stall. When multiple inferences are running at the same time it is not difficult to imagine frequent stalls, inferences sitting idle waiting for complete data before they can move onto the next step.

So far this may not sound too surprising: increasing traffic leads to lower per-inference performance. The shocker is that performance does not degrade gracefully. As traffic contention rises between inferences, just as in rush hour traffic, stalls build up. At some point, performance drops off a cliff. Net utilization of the system plummets from 80% to 45%.

Why not just increase bandwidth in the network? Unfortunately, that alone isn’t enough. Between lumpy traffic and synchronization stalls, the control information critical to manage fairness between inferences is progressively squeezed out and fairness between inferences collapses. Effective multi-tenant management needs more than just increased bandwidth. It needs to provide predictability.

Fixing the gap

High performance AI accelerators, CPU subsystems, HBM, and UCIe interfaces are absolutely necessary for a chiplet-based AI product, but they are not sufficient. The product must also build on a traffic management network able to serve the unique challenges of multi-tenant AI inferencing, requirements well beyond the scope of best-effort networks. Interconnect design must be re-conceived to deliver predictability for these workloads.

Andy Nightingale (VP Product Management and Marketing at Arteris) shared some must-haves to ensure predictability. The network must support isolation between traffic streams from different tenants so that one inference can’t block another. Increasing load will naturally degrade throughput but it should do so gracefully. Coherency guarantees must be maintained, even under load, and behavior must be deterministic under load so that service level agreements can be guaranteed. A network designer can then craft a network fabric to meet their target use-case needs, building on a network IP that can support those guarantees.

Giant datacenters can’t define pricing models against unpredictable performance. Without an inter-chiplet network architecture designed for the task, the only way to guarantee a service level agreement is to add more servers and more power stations. Clearly, the better solution is to use AI systems with network architectures designed to the task, to deliver dependable utilization from servers and power stations already budgeted.

I mentioned safety at the outset of this article. Chiplet-based design is now very popular in automotive systems for a host of reasons. Predictable power is certainly a concern in that domain, but even more important is predictability for safety. In cars, trucks and other vehicles, predictable response is not a performance preference. It is a certification requirement. The same network traffic considerations apply.

You can learn more about Arteris HERE.

 

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.