A couple of weeks back I wrote an article about the use of machine learning and deep neural networks in self-driving cars. Now I find that machine learning is also being applied to help build advanced end-to-end QoS (quality of service) solutions for the automotive IC market. With the advent of self-driving cars comes requirements to be able to deal with all of the data streams coming into the car. Many automotive system designers are turning to heterogeneous multi-core SoCs (system-on-chip) to meet the requirements of increased performance, reduced power consumption and increased overall system reliability.
These new SoCs are not your typical homogeneous multi-core ICs. Instead, they are heterogeneous SoCs with a variety of different compute engines each with widely varying requirements for QoS. Automotive SoCs may include CPU clusters, GPUs, communications cores (Wi-Fi, Blue-tooth, USB, 4G modem etc.), multimedia cores, GPS, DSPs, cameras, gesture processing, display / video and security modules to name a few.
These advanced heterogeneous architectures bring many challenges. The different cores come with dynamic and differing workloads, a mixture of different QoS requirements and the added complexity of having to share memory and interact with each other. Additionally, on-the-fly configurability is also desired to keep power consumption down as the SoC adapts to the different workloads and QoS requirements.
Many applications in self-driving cars also require high performance computing. At the hardware level this means that the multiple cores and modules require cache coherency. Designing coherent systems is hard enough when the data and the architectures are homogeneous but in these new automotive SoCs it’s even harder as the data and architectures are heterogeneous. Additionally, since these applications are running in a car they must be very robust and engineered to be secure and fault tolerant which means designers must architect their systems (software and hardware) to be deadlock free at the application level.
Traditionally system designers have built their own proprietary buses or on-chip communication fabrics. This however has become more difficult due to the use of multi-vendor IPs all of which have different speeds, latency, I/O and QoS requirements. In some cases, system architects have turned to multiple networks or segregated subnetworks to avoid bottlenecks caused by these differences.
So what does machine learning have to do with QoS solutions you ask? Enter NetSpeed Systems. NetSpeed offers a network-on-chip (NoC) synthesis capability. This tool set uses machine-learning algorithms to synthesize and optimize NoCs that are tuned for a user-defined combination of cores and modules with varying workloads and QoS requirements. One of the key benefits of machine learning is that it becomes possible to model the system as a whole, taking into account system interactions and understanding how they affect QoS. NetSpeed’s machine learning technology is designed to optimize performance and power efficiency broadly across use models. The beauty of this approach is that the software has the freedom to build new hybrid network architectures from among different network topologies such as multi-drop bus, ring, tree, and mesh.
Alternatively, system designers can specify a particular topology, overriding the tool’s choices. While human designers are good, humans augmented with fast automated machine learning algorithms are even better. See the diagram for typical bandwidth performance of synthesized networks over those that were manually tuned without the aid of machine learning algorithms.
NetSpeed’s NocStudio software takes experience from the design of much larger scale networks and applies them to the chip level problem. Like other networks, a network-on-chip must ensure QoS for signals traveling from one point on the chip to another within a specified time and without delaying other signals. Because NetSpeed’s NoCs are intended mainly for ARM-based SoCs, they connect directly to IP blocks that support AMBA and AXI protocols. Currently, NetSpeed supports protocols up to AMBA 5 but NetSpeed can also create gaskets for other protocols. At the network level, NetSpeed converts all traffic into a native format called the NetSpeed Streaming Interface Protocol (NSIP).
NocStudio automatically configures NetSpeed’s Orion (non-coherent) or Gemini (coherent) NoC architectures by allowing designers to integrate cores and modules from multiple vendors. As the design evolves, NocStudio updates system performance statistics, enabling designers to make trade-offs. Statistics include the link cost (the number of wires required for the interconnects) and the buffer cost (the number of flip-flops required to implement the necessary FIFO buffers). NocStudio can also automatically add pipeline stages to long wires to meet latency requirements and guarantee QoS. The QoS specifications may include such factors as the data-path bandwidth, transfer latency, service priority, and rate limits.
The end result is a set of synthesis-ready RTL code that implements a full-scale NoC including all of the logic required to ensure cache coherency between modules sharing memory. Not only does NetSpeed enable the automatic synthesis of the network logic but their solution also allows designers to get a first pass feel for the floorplan of the SoC that can be used as a guide for the IC layout team.
In the next week or so look for part II on this subject where I’ll go into more details about NocStudio and how a NoC works and what it looks like. In the meantime, see also:
Gemi-3 press release
NetSpeed raises $10M to bring Machine Learning to SoC Design and Architecture
The Intel Common Platform Foundry Alliance