When Intel created the OpenCV vision processing library, the idea was algorithms could take advantage of the single instruction multiple data (SIMD) capability in Intel architecture processors. (Intel’s ulterior motive is always to sell processors.) As the library has matured, optimized functions take advantage of SSE or AVX.
If you have enough cores, memory, fans, and a wall plug, you can run some very sophisticated vision processing techniques on an Intel desktop processor. The problem with scaling SSE or AVX, or any add-on vector instruction set in a general purpose CPU, is you have to bring the rest of the scalar elements of the architecture along for the ride, burning real estate and power. Intel is hoping to solve this with “Skylake”, shrinking everything until it all fits.
From another direction, the GPU guys got in the act. GPUs are designed primarily to handle large numbers of polygons, shading, and physics. They operate on threads, which usually render an object. By shackling threads together in hardware and software, one can create cores that are in essence vector processing engines.
This is why Intel “Cherry Trail” is getting so much attention. Ditto for the NVIDIA Tegra X1, with its four ARM Cortex-A57, four Cortex-A53, and 256 Maxwell core GPU. In today’s multimedia tablet environments, a GPU is certainly along for the ride anyway, so slimming down the CPU and beefing up the GPU is a good tradeoff. All good, if you have something like 15W handy to power either of those chips.
Many embedded applications run on something more like 1.5W, or less. If you want to put vision processing in that kind of a product, you need an entirely different approach. CEVA has announced the CEVA-XM4, their fourth-generation vision processing IP block.
What kinds of algorithms are we talking about, and why won’t a smartphone-class mobile GPU handle them? For those interested in computer vision, “Computer Vision Metrics” by Scott Krig (available as a free e-book) is a great resource to decipher the history of vision algorithms. He sets up an interesting taxonomy of vision processing:
Figure 2-6 from “Computer Vision Metrics”, Scott Krig, Apress, June 1, 2014.
Here’s the catch: mobile GPUs are made to render known objects, not analyze images to identify and track an object across a scene. That takes horsepower. Some imaging algorithms do work well, but operating on point clouds is a good example of one type of operation that can tax a small GPU beyond its usefulness. Point clouds are becoming increasingly important for 3D object recognition in mobile robotics, and embedded vision in general.
The CEVA-XM4 vision IP is optimized for operations across the processing taxonomy. By stripping away extra stuff and concentrating on a fast vector processing unit, it can operate on a 4096-bit wide swath of data in a single cycle, keeping memory bandwidth under 512-bits. Compared to the NVIDIA Tegra K1 (similar in power, about half the performance of the Tegra X1), CEVA says they can perform object detection and tracking in 1/10 the power using only 5% of the die area.
The IP block also includes support for user-defined accelerators, allowing further customization for specific applications. Its Harvard architecture splits instruction and memory, keeping the flow of vision data smoother.
CEVA is also looking at more advanced algorithms for the CEVA-XM4, particularly a class of deep learning and convolution neural network (CNN) that allow embedded vision to take on more intuitive operations. This could be important for automotive ADAS applications, where the object presenting an issue could be just about anything in a very complex scene. Correctly identifying objects of interest, and preventing false positives, in real-time is crucial.
The CEVA-XM4 outperforms its CEVA-MM3101 predecessor by up to 8x in computational speed, with 35% better energy efficiency. The CEVA-XM4 will be on display in the CEVA booth at Mobile World Congress 2015.