Convolutional Neural Network (CNN) recognition algorithm is generating very high interest in the semi industry. At first, because CNN provides best recognition quality when compared with alternative recognition algorithms. CEVA Deep Neural Network (CDNN) software framework, implemented with CEVA-XM4 imaging & vision DSP, accelerates machine learning deployment for embedded systems. Neural Networks algorithms, used for any cognitive processing, including visual or audio, are very similar to human brain processing and really deserve Artificial Intelligence (AI) denomination.
In fact, networks develop over time, as data collected and analyzed and through training phase, as convolutional networks are learning new object types from examples. CNN algorithm selected by CEVA is a Deep Learning Neural Network, a family of neural network methods using high number of layers (hence deep), focusing on feature representations. CEVA is offering improved capabilities and performance for latest network topologies and layers with CDNN2, including support for Caffe and TensorFlow, Google’s software library for machine learning.
To support the emergent application like surveillance or ADAS, the embedded system should be capable of running deep learning-based video analytics directly on any camera-enabled device, in real time. This capability is offered by CDNN2 algorithm running with CEVA-XM4 DSP, enabling real-time classification. CDNN2 supports any given layer in any network topology, on any resolution, as trained by Caffe and TensorFlow. CDNN2 is also able to support the most demanding machine learning networks, from pre-trained network to embedded system, including GoogLeNet, VGG, SegNet, Alexnet, ResNet and Network-in-network (NIN).
The development process for implementing machine learning in embedded systems is interactive, involving offline training and CEVA Network Generator and enabling real-time classification with pre-trained networks. Network Generator is push button, converting pre-trained networks to real-time optimized. The process:
1. Receives network model & weights as input from offline training (via “Caffe” or “TensoFlow”)
2. Automatically converted into a real-time network model, via CEVA Network Generator
3. Utilizes real-time network model in CNN applications on CEVA-XM4
CEVA Network Generator is running offline and will convert the network information (model and weight) into a real-time network model. It will optimize conversion for power efficiency, generates a fixed point from a floating point model and adapts for embedded constraints. The Network Generator keeps high accuracy as the conversion result shows less than 1% deviation.
Deliverables include real-time example models for image classification, localization or object detection. The (real-time) neural network libraries have been optimized for CEVA-XM4 vision DSP, supporting various network structures and layers, accepting fixed or variable input sizes.
CDNN2 becomes industry’s first software framework for embedded systems to automatically support networks generated by TensorFlow™. Combined with CEVA-XM4 imaging and vision processor, CDNN2 offers highly power-efficient deep learning solution for any camera-enabled device and significant time-to-market advantages for implementing machine learning in embedded systems. Compared to a leading GPU-base system, CEVA solution significantly improves on power consumption and memory bandwidth.
In CDNN, the last “N” is for Network, let’s take a look at the various Convolutional Deep Neural Network supported by CEVA. This is a partial list, as additional proprietary networks are also supported:
- AlexNet, linear topology, 24 layers and 224×224 RoI
- SegNet, Multiple-input-Multiple-output topology , 90 layers, 480×360 RoI
- GoogLeNet, Multiple-input (concatenation layer) Multiple layers per level topology, 23 layers + 9 inceptions, 220×220 RoI
- VGG-19, linear topology, 19 layers, 224×224 RoI
Certain of these acronyms deserve some explanation. RoI stands for Region of Interest. Searching the web, I have found this clarification: “An input image and multiple regions of interest (RoIs) are input into a fully convolutional network. Each RoI is pooled into a fixed-size feature map and then mapped to a feature vector by fully connected layers (FCs). The network has two output vectors per RoI: softmax probabilities and per-class bounding-box regression offsets. The architecture is trained end-to-end with a multi-task loss.”
This means for AlexNet a fixed-size feature map equal to 224×224 pixels.
Inception is found in GoogLeNet description. Inception means “a Network in a Network in a Network…”, illustrated below:
GoogLeNet is the only CDNN integrating inception and CEVA claims to be the first DSP IP vendor supporting GoogLeNet network. Considering the very high interest for CNN algorithms in the industry, in particular in Automotive (ADAS), no doubt that CDNN2 framework associated with a low-power DSP like CEVA-XM4 will see high adoption in the near future in various applications like smartphones, surveillance, Augmented Reality (AR)/Virtual Reality (VR), drones and obviously ADAS.
You will find a complete description of CDNN2, including pdf presentation and video here