Network on chip (NoC) already has a long list of compelling reasons driving its use in large SOC designs. However, this week Arteris introduced their PIANO 2.0 software that provides an even more compelling reason to use their FlexNoC architecture. Let’s recap. Arteris FlexNoC gives SOC architects and designers a powerful tool for provisioning top level interconnect. SOC’s have long since passed the days where connections between the blocks can be hardwired. Routing resources are too scarce, and flexibility for inter-block communication and data exchange has become paramount.
NoC is added to a design as RTL blocks that manage data exchange between blocks over a high performance and reliable on-chip network. Arteris’ FlexNoC is even capable of supporting cache coherent memory interfaces. Now, to understand why PIANO 2.0 is important it’s key to understand that a significant variability in timing closure efficiency is introduced when moving from the front end to the back end. PIANO 2.0 delivers a strong connection between RTL spec and the later physical timing closure steps. Until now, NoC implementation optimization was akin to being limited to wire load models instead of full parasitics.
PIANO 2.0 promotes intelligently moving interface elements away from their host or target blocks and into the routing channels. This works remarkably well for improving area and performance. The building blocks for an NoC are small and ideal for fitting in the ‘grout’ of the design. However, their placement and the provisioning of supporting pipeline stages can have a significant effect of area, power and timing.
Without any hints from the front end, placement tools will often cluster NoC logic blocks in ways that fails to meet timing, or that requires the addition of pipeline stages. One contributing factor is that in 28nm and below, many interconnect paths between top level blocks are simply too long for the signal to arrive in under one clock cycle. Attempting to fix this by adding more pipeline stages or relying on LVT cells can consume critical area and add to static and dynamic power consumption.
Arteris has added feedback loops so that physical implementation tools from Cadence and Synopsys can create better placement for these interconnect IP blocks. It is axiomatic that better communication between front end and back end design teams will improve design results and reduce unnecessary iterations. PIANO 2.0 help facilitate front to back dataflow in a systematic fashion.
Arteris provides some benchmark results to support the effectiveness of PIANO 2.0. In their first example, they provide data on a design with no pipeline stages starting with Design Compiler and only using wireload models that is forecast to require 385K sq microns. Taking this same non-piplelined design to DC Topological, it fails timing by 1.26ns and the interconnect IP area has grown to 830K sq microns. To make this meet timing with manual pipeline additions, the interconnect IP area grows to 1,008K sq microns. Instead, by using PIANO 2.0 the design meets timing with an interconnect IP area of 806K sq microns. This result also saves 46nW over the manually pipelined case.
In another example Arteris provides, they compare manual pipeline insertion with Auto Pipeline in PIANO 2.0. There was an 11% reduction in interconnect IP area, from 1.77M sq microns to 1.58M sq microns. The process for pipeline insertion went down from 45 days to 1.5 days as well. This 28nm design has 20 power domains, 10 clocks running between 100 and 400 MHz and 160 NoC NIU sockets.
Arteris is including endorsements from several major customers and EDA vendors in their product announcement. Among them is Horst Rieger Manager at the Design Services Group in Renasas and Dr. Antun Domic, CTO at Synopsys. Also, Senior Analyst Mike Delmer with the Linley group commented on the technology in the Arteris press release on PIANO 2.0.
Arteris PIANO 2.0 offers an effective solution for getting rapidly to timing closure with all the added benefits of an NoC architecture. This is not an incremental improvement either. It dramatically improves area, congestion, power and timing. Given that it works for coherent and non-coherent interconnect, it should be widely applicable to almost any design at 28nm or below.