The Linley Group is well-known for their esteemed Microprocessor Report publication, now in its 28th year. Accompanying their repertoire of industry reports, TLG also sponsors regular conferences, highlighting the latest developments in processor architecture and implementation.
One of the highlights of the conference was the presentation from Benoit de Lescure, Director of Application Engineering at Arteris, and Marc Greenberg, Director of Product Marketing at Synopsys. Benoit provided an update on Network-on-Chip (NoC) architecture design, with an emphasis on optimizing transactions to the unique capabilities of LPDDR4 memory. Marc joined Benoit, to describe how the Synopsys memory controller IP integrates with the new Arteris NoC memory transaction scheduling unit for LPDDR4.
As SoC designs integrate a greater number and diversity of processing units, traditional crossbar or hierarchical (multi-level) bus architectures do not scale. Routing congestion becomes a major issue for physical implementation, while satisfying Quality-of-Service (QoS) requirements becomes a difficult timing closure task.
Most users associate the term Quality-of-Service with the allocation and scheduling of resources to provide computation that meets critical deadline constraints, typically under the supervision of a real-time operating system. In the context of an SoC with many disparate processing blocks, similar QoS considerations apply. The very heterogeneous types of data traffic on a processor SoC will have different latency and bandwidth requirements:
- clock frequency
- data width
- peak throughput
- traffic patterns – e.g., transaction length, address alignment
- reaction to latency and/or “back pressure” from pending requests
These system performance characteristics require specific focus on:
- throughput of multiple concurrent links (bandwidth)
- delay from a request initiated by a master through the interconnect to the target (latency)
- memory efficiency (% of maximum memory throughput realized, sharing the finite memory bandwidth across many command requests)
In essence, a Network-on-Chip implementation involves encapsulating data traffic between processing units into packets, and transporting those packets serially, in a pipelined manner. This enables scaling of SoC complexity, while managing physical routing congestion and satisfying QoS requirements.
As Marc put it, “The goal of the NoC and memory controller is to get the right data to the right master at the right time.”
QoS in a NoC Architecture
The Arteris NoC architecture consists of Network Interface Units (NIU), which communicate directly with each IP core. This logic converts traditional (AMBA AXI, AHB, OCP, or customer proprietary) protocol transactions into packets for transport across the NoC network fabric. At the receiving end, the NIU communicates with a core using an IP socket interface.
The Arteris FlexNoC solution provides synthesizable RTL modules for physical implementation of the NoC. The NIU logic is typically places close to the related IP core in the chip floorplan, to optimize timing and minimize routing congestion. Pipeline register insertion is supported, to optimize timing. And, to be sure, the FlexNoC package includes a suite of SystemC TLM simulation and performance analysis tools.
The NoC fabric specifically addresses QoS bandwidth and latency requirements in several ways:
- packet priority assignment (by packet, or by all socket transactions)
- dynamic “pressure” relief (provide a low latency path to high-priority packets when traffic is high)
- communication between cascaded arbiters at each network switch (to avoid deadlocks)
and the main emphasis of Benoit’s and Marc’s presentation:
- optimization of the memory scheduler DDR commands to the memory controller IP block, for highest memory efficiency and fewest wait states
NoC QoS with LPDDR4
The advent of LPDDR4 introduces new features in physical memory addressing and timing – with these new features comes opportunities for additional QoS optimizations. Benoit and Marc described how Arteris and Synopsys have collaborated to leverage these new capabilities.
The NoC memory request scheduler and memory controller optimize the sequence of LPDDR4 commands to the shared memory, managing:
- multiple, independent LPDDR4 channels
- memory interleaving (logical-to-physical mapping), to optimize addresses for low “locality of reference” packets
- coherent (and non-coherent) memory requirements for different IP cores
- power dissipation options, separating critical functions from active/standby memory areas
- per bank refresh scheduling
- PHY training/calibration cycles
As SoC processors continue to integrate a greater number and complexity of IP cores, the scalability of the NoC architecture provides a key advantage. The QoS performance requirements of the various core functions will result in a wide set of characteristic “traffic” through the network switch fabric, with varying bandwidth and latency requirements. Specifically, the schedule of memory commands needs to be optimized to achieve QoS metrics, taking maximum opportunity to leverage new LPDDR4 features.
Arteris and Synopsys recently illustrated how a collaborative development partnership between NoC architecture and memory controller IP provider can achieve significantly improved power/performance.
For a great summary of the recent TLG Processor Conference, check out Tom Simon’s recent article: