We’re creatures of habit. As technologists, we want to move fast and break things, but only on our terms. Everything else should remain the same or improve with minimum disruption. No fair breaking the way we do our jobs as we plot a path to greatness. This is irrational, of course. Real progress often demands essential changes where we’d rather not see them. Still, we all do it, even if unintentionally. Which becomes apparent when evaluating an IP block or tool change in our design flows, herein considering NoC interconnects as an alternative to crossbars. I talked to Matt Mangan and Kurt Shuler (respective FAE manager and VP marketing at Arteris IP) about their experiences in smoothing the path to NoC adoption.
How not to benchmark a NoC
You’re a strong design team with a successful first-generation product in production. Now the product plan calls for premium versions requiring x2 and x4 instantiations of some pretty bulky subsystems. You know you struggled to close timing and meet cost on the first generation. Congestion and area will be even worse on these new versions. You’ve heard good things about NoC interconnect, so you launch an evaluation.
What’s the first obvious thing to do? Start with the production RTL and simply replace each crossbar in the interconnect hierarchy with a NoC. Which gives you a cascading structure of NoCs mirroring the structure of crossbars you had. NoCs are supposed to be more area and congestion efficient, so you should see this in the trial, right?
Wrong. Designing a NoC this way immediately throws away its area and congestion advantages. It might even look worse than the crossbar implementation. Matt puts it this way: when you design with crossbars, you build your design around the interconnect. When you design with a NoC, you build the NoC around the design. In simple terms, you floorplan the IPs the way you want them to layout, then you let the NoC flow through the gaps between those IPs. There will be some iterations to expand room for NoC routes here and there, but the intent remains. Therefore, to get a meaningful measure of NoC impact on the design, you should rip out all the crossbar structures and build the NoC flat.
Matt mentioned one customer working on a mid-sized automotive application. A back-of-the-envelope calculation estimated that a crossbar interconnect would cost about 10 square millimeters of area, clearly unreasonable. When they prototyped a flat NoC implementation, they got down to 5% of the size. Why the huge difference? The internal structure of the flat NoC dispenses with a vast number of redundant switches, wires and control.
I don’t need a NoC. My design is tiny
Sometimes it doesn’t feel sensible to even consider a NoC. Think of a Bluetooth toothbrush. The SoC and the interconnect will be about as small as you can imagine. Unchallenged by congestion or area. But this is a battery-operated, consumer device. It’s very important to run a toothbrush for multiple days between recharges. Anything a design team could do to further reduce power would be a win.
In a low-power design, you’ll power-gate and clock-gate IPs everywhere you possibly can, even down inside the IPs where possible. But you don’t have that level of control on a crossbar. Either it’s all on or it’s all off. However, a NoC can be gated internally, unlike a crossbar. In fact, NoCs can provide very fine-grained control over dynamic and static power.
This power management is completely configurable through the Arteris IP NoC generator and just as intelligently managed as in power management for endpoint IP. Waking up when needed, powering down when not needed. Effectively off 99% of the time, a real competitive advantage over a traditional interconnect. Arteris IP had such a customer who used their NoC for precisely this reason.
My design is a monster. I must have a custom network
Then again, maybe your design is so massive and latency-sensitive that the only way you can see making it work is through hand-crafted communication. AI training accelerators for hyperscale datacenters are a good example. Often these are built as arrays of processing elements, but not uniform arrays because you leave holes for caches, scratch memories and other goodies. And you want to tweak network logic to minimize every picosecond of latency. Also, adding special networking for direct broadcast and aggregation, bypassing the standard network.
AI teams all over the world are building such accelerators to gain competitive advantage. Arteris IP has been working with leaders in the field for many years and has been able to evolve what they offer in step with those evolving design needs. Now, AI designers can fine-tune their NoC networks without having to hand-craft RTL. All the advantages of customization while retaining the advantages of a generator solution.
NoCs have broader appeal than you may realize
All sounds good, but who is really invested in these NoCs? The graphic at the head of this article will give you some idea. You can learn more about Arteris IP NoC solutions HERE.