Machine Learning is at the hype peak, according with Gartner’s August 2016 Hype Cycle for Emerging Technologies. The demand for vision processor IP is strong in smartphone, automotive and consumer electronics segments. ASSP based solutions can make the job, but how can OEM create differentiation, control their destiny and pricing if they select an ASSP? In mobile segment, integrating Mediatek or Qualcomm SoC supporting camera/vision will lead OEM to build a ‘me too’ smartphone. OEM developing ADAS or Autonomous for automotive are facing similar problem when integrating MobilEye or NVIDIA ASSP as they can’t add their own algorithms and differentiate.
It’s the right time to integrate DSP based vision processor IP complete solution, like new CEVA-XM6 DSP Core, Hardware Accelerators, Neural Network Software Framework, Software Libraries and Algorithms. The right time because the performance of deep learning technology, measured by the error rate on image recognition is, this year and for the first time, better than human performance! It’s also the right time because adopting CEVA Convolutional Deep Neural Network (CDNN) solution implemented on XM6 DSP core will enable embedded neural networks for mass market (low cost) vision application and allows delivery of deep learning solutions on (low power) embedded devices. This low cost, low power solution is not emerging by chance. The CEVA-XM6 based vision platform has been built on the strong foundation of XM4 counting 25 design-wins and the vast experience accumulated across multiple end markets and applications where neural network are being deployed.
We have explained the Convolutional Deep Neural Network (CDNN) theory and given some examples of proprietary networks in a previous blog in Semiwiki, we will focus today on the way to generate CDNN, thanks to S/W development tools and CEVA network generator, and describe the H/W implementation.
Before to run imaging and vision algorithms on CEVA-XM6 DSP, you can create your own CDNN, using neural network software framework, made of real-time libraries, Computer Vision libraries (CEVA-CV based on Open-CV), Vision Processing API (OpenVX, royalty free open standard API from Khronos, integrated into CEVA-VX) and 3[SUP]rd[/SUP] party S/W. At this point, any customer can create differentiation by inserting proprietary algorithms. Instead of using one CDNN fitting all application, CEVA Network Generator allows creating a unique CDNN, customer or application specific.
CEVA-XM6 is the 5[SUP]th[/SUP] generation Imaging & Vision Technology from CEVA and the IP vendor is bringing major improvements compared with the previous generation, CEVA-XM4. If you look at the right part of the Hardware box, you identify hardware accelerators (HWA), namely CDNN, De-Warp and 3[SUP]rd[/SUP] party HWA. Implementing in frozen hardware the well-known and repetitive tasks is a very good way to optimize performance, freeing the DSP which can be used to run other tasks, and reduce the power consumption as dedicated H/W will always be more power efficient that any processor to run the same task. For those who remember the digital signal processing implemented to run the wireless phone base-band, if the Viterbi decoding algorithm was initially running on (TI) DSP, the task has very quickly moved to an HWA. This is the same principle, applied to imaging and vision technology.
Scatter-gather capability: CEVA-XM6 enables load/store vector elements from/into multiple memory location in a single cycle. CEVA-XM6 is able to load values from 32 addresses per cycle. Scatter-gather not only boosts performance, but also allows minimizing access to/from memory, known to severely impact the power consumption.
If scatter-gather is a performance booster whatever is the application, Sliding-Window data processing mechanism is completely dedicated to imaging. The principle is to take advantage of pixel overlap in image processing by reusing same data to produce multiple outputs. If implementing Sliding-Window mechanism lead to significantly increase the DSP core processing capability, it also reduces power consumption and save external memory bandwidth. One of the challenges linked with machine learning on neural network is to reduce the amount of bandwidth consuming and computing bottleneck. That’s why implementing techniques like scatter-gather or sliding-window is crucial for bringing machine learning to mass market applications, as these require using low cost, low power solutions.
As of today, CEVA has implemented 512 MACs (16×16) as hardware accelerators, as well as many of the convolutional layers (Normalization, Pooling, etc.) required by the CDNN and plan to implement even more layers in the future. How efficient is CEVA-XM6 architecture? Just consider that the MAC utilization is greater than 95%, and you realize that CEVA-XM6 has been optimized to the maximum.
To answer the initial question, we can say yes, the machine learning technology has been made available to the mass market, targeting Autonomous Driving, Sense and Avoid Drones, Virtual and Augmented Reality, Smart Surveillance, Smartphones, Robotics and More.