Robots – we have all been waiting for them since we were young. We watched Star Wars, or in the case of the slightly longer-lived of us, we watched Forbidden Planet or Lost in Space. We knew that our future robot friends would be able to move around and interact with their environment. What we did not foresee long ago was that instead of moving among us, we would be riding inside of the first widely produced robots – namely autonomous cars.
It’s pretty clear now to see that cars are the perfect platform for a machine that autonomously interacts with their environment. Typically, they traverse a smooth flat surface, have well defined interactions – starting, stopping, turning. They have the room, cooling and power for the substantial computing requirements necessary for their operation. Automating driving will also provide huge benefits to people. Instead of needing to be fully engaged in operating a vehicle – “drivers” will ultimately be able to focus on other activities while in their cars. In the near term autonomous vehicle will improve traffic safety.
The task of assisting or driving a vehicle requires creating a virtual 3D world inside the driving system that accurately reflects the outside physical world. A vast array of sensors is required to do this. Data from optical, radar, LIDAR, inertial and other sensors needs to be combined in real time to accomplish this. Then of course, the system has to make decisions based on projected future movements of itself and the surrounding objects.
Performing these operations in real time will require more than general purpose processors. Neural networks are already being used for many of the tasks necessary for object recognition. These systems have to handle extremely high bandwidth and do it in real time. Low latency is essential. We are seeing commercial subsystems that are targeting this market. NXP has introduced its “Blue Box” that contain specialized processor chips – the S32V234 and the LS2085A. These powerful SOC’s are specifically designed for the workloads seen in autonomous driving. They have multiple ARM cores with substantial caches and memory interfaces. They also have IO subsystems for communicating with each other and the sensors.
At the same time Nvidia also has its own solution called Drive PX 2, which is built with 2 Tegras each having an integrated Pascal GPU along with quad A57’s. There are also two discrete Pascal GPU’s. During the Linley 2016 Processor Conference at the end of September, on-chip network IP provider Arteris presented on the topic of using cache coherent networking to improve the operation of the kinds of SOC’s found in the processing units aimed at the autonomous driving market.
Going back to basics, we know that ADAS systems require low latency high bandwidth computation. The SOC’s being developed for this application have many processors and additional components such as accelerators, specialized processing units and interfaces to numerous sensors. In the world of CPU’s it is a long standing practice to add custom hardwired memory caches to reduce time consuming reads and writes to external RAM. With a handful of processors and long development cycles it made sense to custom build memory cache systems for CPU’s chips.
Thing have changed. Processor cores are used frequently in larger numbers in SOC’s. What’s more is that there is a huge benefit in having the other blocks in the SOC share cache coherency with each other and the processors. The performance and power benefits are immense. It’s no longer possible to build custom cache designs for SOC’s – what is needed is a flexible and systematic way to implement cache coherency interfaces for SOC’s, which have increasing complexity and shorter development cycles.
Arteris already has a robust solution for replacing hardwired buses in SOC’s with a configurable and flexible interconnect network. Just as we have moved away from using dedicated printer, keyboard and mouse cables, FlexNoC from Arteris let’s designers quickly size and implement an on-chip networks to move data with lower power and real estate requirements. Packets of data are transferred along a network topology of high speed interconnect between blocks. It has built in error correction and makes the best use of on-chip resources.
Arteris has used this as a foundation layer to implement their Ncore IP for providing cache coherent memory interfaces within an SOC. With the supercomputer level of performance needed in ADAS systems, a high performance cache coherency solution is ideal. However, the feature that takes Ncore to the next level is its ability to take blocks that are not designed with cache capability and give them full cache coherency, even providing then with their own local proxy cache.
Ncore allows the addition of their Non-Coherent Bridge blocks and Proxy Caches to make IP blocks that had no cache capability into full-fledged members of the on-chip cache scheme. This comes with all the benefits, such as pre-fetch effect, write gathering effect and optimized coherent memory access. Arteris also has added a number of powerful optimization to Ncore, like multiple snoop filters to ensure that the cache coherency uses the smallest amount of area and has the lowest possible latency.
We can expect to see a number of larger and more powerful SOC’s for neural networks, image processing and autonomous vehicle control. Of course, infotainment will also drive chip complexity. These chips will probably lead the industry for complexity and sheer processing power and speed. Their designers will look to use the most advanced technology to achieve the highest performance within the shortest development cycle. On-chip networking is already a necessity, as is cache coherency, for these designs. For more information on how Arteris is working in this market, look here on their website.