We will precisely explain the meaning of deadlock in a modern, complex multi-core SoC. First, let’s take a look at the crash of the Air France 296, when a brand new Airbus A320 crashed during a demo flight on June 26, 1988. This Airbus 320, the first plane being completely automated, thanks to the FADEC flight system, was running a demo flight. The pilot decided to mimic a landing just above the airport, without effectively landing. The goal was to demonstrate how brilliant the plane was. Unfortunately, such a maneuver was not recognized by the FADEC and the flight system, being blocked, decided to reset itself. The problem was that in 1988, the reset/restart took 7 seconds, and during the reset time, the pilot was trying to put the gas on and climb (as he would have done with no problem on a plane from the previous generation). The result is printed on the picture below… At that time (1988) there was no SoC but just a 68020 processor, but what has happened is the equivalent to deadlock in modern SoC: the system becomes blocked, and there is no other way to escape this state except going to reset.
Let’s jumpstart to 2016, electronic systems are now based on System-on-Chip integrating multiples –if not many- processor cores (CPU/GPU/DSP) and several dozen IP. As soon as you architect such a complex multi-cores SoC, you have to use and design an interconnect IP in order to exchange data between the multiple agents. It can be a bus-based or an internally defined interconnect IP, but the trend is to move to a commercial Network-on-Chip (NoC) IP, like NetSpeed SoC interconnect IP generated by NocStudio.
Now a simplified case study where two agents are sharing the same interconnects, as pictured below. We can intuitively understand that deadlock are occurring when one agent (agent0) need to read a data to complete a task, which data is a response generated by the other agent (agent1) expecting to read the result of the operation done by agent0. In this case the situation is creating a dependency illustrated by the red arrows. The dependency is that read requests can complete only when they can issue a read response in the other direction. A deadlock occurs if buffers in both directions are full of read requests and there is no way to send read responses. Chicken and egg is a pertinent illustration of dependency… When deadlock occurs, the system is stuck and there is no other way to escape this state but to reset/restart it, assuming that architects have envisaged this type of event and integrated adequate test structure. If you remember the beginning of this blog, resetting a system can be dramatic, even if, in most of electronic systems, the risk is to lose data, not life.
Before using NetSpeed’s NocStudio, SoC architects using home-made interconnects had to plan SoC validation campaign to detect potential deadlocks. The problem with such strategy is the awfully long lead time associated with the simulation. You have to run functional simulations and it may take weeks if not months of computer intensive validation to discover the first deadlock, even if it would take a few hours or days when the SoC is integrated into the real system. The entire process (identify deadlocks, design fix, run again simulation) could take as long as six months… which seems unacceptable in respect with the Time-To-Market (TTM) request.
NetSpeed’s NocStudio is used at the architecture definition level, when the architect specify the communication protocol inside the SoC. NetSpeed IP achieves full deadlock detection and resolution by partitioning complex protocol transactions into the simpler sub-flows from one endpoint to the next. The deadlock in Figure 2 can be avoided by having separate resources for the read and read-response packets, by adding a virtual channel to the network to create an alternative read-response path. Machine learning algorithms are used to automatically learn the correct processing order in which sub-flows are processed and mapped to virtual networks.
To detect protocol deadlocks, properties of all system components in terms of how they produce and consume network packets and these packets are inter-related to each other are required. Designers use a flexible formal language to capture the deadlock relevant properties of various system components. There are two ways dependencies can be specified in NocStudio. They can be implied within a traffic description, or they can be specified explicitly. Subsequently this information is also used to construct the network level deadlock-free NoC. That’s why NetSpeed uses the “Correct by Construction” slogan to describe the NoC generated by NocStudio.
We know since decades that state machines can lock and put a chip in trouble, forcing to reset a complete electronic system. Since architects are defining always more complexes and heterogeneous SoC, the probability to introduce dependencies dramatically increase, with the direct consequence to put the SoC in deadlock. Even if deadlock can be detected by running simulations and fixed by design, this iterative process is way too long to comply with the TTM requirements. NetSpeed propose a solution: design an interconnect IP “correct by construction”, using NocStudio to generate Orion, a deadlock-free NoC.
From Eric Esteve from IPNEST
Share this post via:
The Intel Common Platform Foundry Alliance