Ambiq Micro has built a family of voice processing MCU dedicated to battery powered, energy sensitive systems, supporting mobile application like wearable. The company is facing two strong challenges: support computationally intensive processing (NN-based far field) and speech recognition algorithms, while offering “ultra-low power” devices. When Ambiq claim to build ultra-low-power devices, it’s really the case: the company has developed a unique and proprietary technology, the Subthreshold Power Optimized Technology (SPOT™) platform (SPOT architecture uses transistors biased in the subthreshold region of operation).
These two challenges are clearly in contradiction: intensive processing built in energy-sensitive devices, this sounds like the perfect definition of energy-efficiency! According with Aaron Grassian, VP of marketing, Ambiq Micro, “Porting the HiFi 5 DSP to Ambiq Micro’s SPOT platform enables product designers, ODMs and OEMs to take the most advantage of technology from audio software leaders like DSP Concepts and Sensory by adding voice assistant integration, command and control, and conversational UIs to portable, mobile products without sacrificing quality or battery life.”
Tensilica HiFi 5 DSP core is the new generation of voice dedicated DSP, it can be interesting to look at the main changes with the previous HiFi 4 DSP core. MAC capability has been multiplied by 2X, leading to 2X audio (pre- and post-) processing. For NN processing, the HiFi 5 offer 4X MAC capability versus HiFi 4, including 32 16×8 or 16×4 MACs per cycle. Moreover, the new HiFi NN library offers a highly optimized set of library functions commonly used in NN processing (especially speech). And software backward compatibility with the complete HiFi product line is guaranteed, totaling over 300 HiFi-optimized audio and voice codecs and audio enhancement software packages.
Such a voice dedicated DSP has to support voice pre-processing functions, like beamforming (or spatial filtering), noise reduction and accoustic echo cancellation (AEC), and speech recognition : features extraction, NN processing layers and language decoding. As a side note, Cadence Tensilica HiFi DSP should be pretty good IP core, as the company claims 95 licensees for HiFi DSP worldwide, and ship 1 billion cores annualy (probably a bit less IC when you integrate several cores in the same chip).
Clearly, there is a dramatic rise in popularity of digital home assistants (Alexa and the likes) that features voice UI experiences, leading to a new wave of innovation in far-field processing algorithms and in neural network-based speech recognition. It’s now clear that the processing power has to be at the edge device and not in the cloud, and there are good reasons to support this architecture. The consumer demand is for lower latency, increased privacy and more natural voice UI interactions and the processing work load on device has to increase rapidly to make the end-user happy.
For OEM also, voice-controlled User Interfaces are becoming more important. For example, in many of today’s in-car, voice UI infotainment platforms end up training the driver (as opposed to the other way around). And consumer adoption of voice assistant technology in home is encouraging car manufacturers to embrace voice. Moreover, automotive voice assistants require local voice recognition, pushing again for more processing power in the edge device. In fact, cloud is not always available, and, again, latency is a concern for consumer experience.
If we agree that speech recognition should be done locally, how to enable this trend? You need more advanced NN algorithm techniques at first, and high-performance DSP cores available at the edge. But you also need lower-precision NN memory weights to reduce the memory size and bandwidth requirements, to build an economically viable and energy-efficient edge device. If you can meet these conditions, you can address privacy concerns, low latency demand and enable on-device speech recognition.
For example, to meet power and memory bandwidth efficiency, the HiFi 5 offers natively support for lower precision weights: 8-, 4-, 2- and even 1-bit, Viterbi decode support and 8-bit SIMD element support for sorting, searching for string processing.
I don’t know if your kitchen looks like this above pictured, if it’s the case you will probably count several Tensilica HiFi 5 DSP at home, all located at the edge!
ByEric Esteve fromIPnest