Voice-based control is arguably becoming another killer app, or killer app-enabler in the very significant shifts we are seeing in automation. After a bumpy start in car feature control (for navigation, phone calls, etc) and early smartphone “intelligent” assistants, voice-based interfaces now seem to be maturing into a genuinely useful capability. As I have said before, it’s about time. Buttons and keyboards, real or virtual, and byzantine menu hierarchies are clumsy, distracting, sometimes dangerous and anyway are a poor substitute for the way we humans would ideally like to communicate with our machines.
AI tends to dominate our thinking in this area – speech recognition and natural language processing being obvious examples – but there’s a rather important step before that; picking up the voice (or voices) and passing clean signals to those algorithms. This isn’t just a question of using a high-fidelity analog path from microphones to the eventual digitized output. Voice-based systems these days frequently use multiple microphones for direction discrimination, you have to deal with (acoustic) reflections from walls and other surfaces, you need to be able to capture commands in noisy environments and increasingly there is a trend to identifying who is speaking among multiple potential speakers. Handling all of this is owned by the front-end of voice-processing, also known as voice pickup.
CEVA is already very active in this space with their CEVA-TeakLite-4 (ultra-low-power) and CEVA-X2 (high-performance) embedded DSP platforms. Now they have introduced a software suite they call ClearVox, bundling the input voice-processing algorithms optimized to drive these hardware platforms, rounding out a complete voice-pickup solution package for voice-enabled system builders. This spans from voice activation (eg. “Hello Google”), to tracking speakers, beamforming, echo cancellation and noise suppression.
I already talked about voice activation in any earlier blog (Active Voice), based on some very neat technology supporting always-on listening at ultra-low power, allowing everything else to be powered down until the trigger word/phrase is detected.
ClearVox also includes beamforming software which is absolutely essential for directional discrimination and speaker tracking in far-field applications (smart speakers like Amazon Echo and Google Home for example). Multiple omnidirectional microphones, from as few as 2 to as many as 13, provide inputs to the (DSP) software which can then use weighting, filtering and summing to extract a strongly directed signal, which could also be used to track a speaker (whose direction may change).
Another benefit in beamforming is significant noise reduction in the signal. By focusing only on the speaker, input from other directions is very largely suppressed. ClearVox algorithms support circular array technologies, familiar in smart speakers, and also linear technologies, expected to become more popular in smart TVs.
Acoustic echo cancellation (AEC) is another essential feature in all voice-activated systems (even in speakerphones). Sound waves from a speaker and other sources in any enclosed area (living room, conference room, car) will reflect back from hard surfaces (walls, tables, etc), resulting in multiple delayed inputs to the microphones on your voice-based system and adding further noise the signal. But echoes can be recognized and removed given sufficient sophistication in the DSP software. Again, ClearVox provides this capability.
As a part of the AEC algorithms, ClearVox also provides support for barge-in. This is voice-driven systems, where the speaker (you) must take priority over any music that may be playing or responses from the personal assistant. Asking your assistant to turn the music volume down isn’t going to work very well if said assistant can’t hear you over the music it is playing, or if it is preoccupied with answering your last question.
Naturally since this is CEVA software running on CEVA hardware (and it only runs with CEVA hardware, if you were wondering) it is optimized for performance and power. You don’t have to sweat those details when you’re building your solution. That said, it is configurable and modular so you can optimize the platform to best suit your needs in your system.
ClearVox is available in two configurations: for near-field applications (headsets, earbuds, hearables and wearables) and far-field applications (smart speakers, smart home, voice enabled IoT, mobile phones). The product is available to lead customers today and will be generally available in Q2 (2018).
CEVA provide a reference design showing use of ClearVox which you can see this week at CES. Again, this is hot off the presses, so checkout the website for more details.