There is a famous scene in the 1976 movie Taxi Driver when Robert De Niro’s character Travis is pretending to have a conversation looking in the mirror and repeatedly saying “Are you talking to me?”. I think about this scene every time I use a voice active device – Hey, are you talking to me? Yes, I am, but are you listening?
Voice command, which was the stuff of fantasy not that many years ago, has become a staple for smart products and systems. Even though many of these systems use computational processes similar to those used in our brains for voice recognition, electronic systems must operate under a set of tight constraints to make their use feasible. Chief among these are power limitations and the need to maintain privacy, primarily when conversation is not intended for the voice operated smart device. As a result, designers must design these systems with extra care to ensure these requirements are met.
Consumers will not tolerate voice systems that send all of their conversations over the internet to the cloud for analysis and potential recording. Furthermore, it is simply too costly to transmit that much audio information. It would require too much bandwidth and power consumption. Ideally voice activated systems would largely be in sleep mode with the absolute minimum circuitry active – listening for potential voice commands.
With that in mind Dolphin Design has developed several IPs that help systems locally detect valid voice input to start the process of interpreting voice commands. Voice activity detection (VAD) starts with the detection of a keyword that triggers overall system activation. Only once a voice and a correct keyword is detected will the entire voice recognition chain be switched on. Dolphin has a new white paper titled “Why VAD and what solution to choose?” that talks about different architectures for VAD based systems and their relative merits.
One of the most important metrics is the detection latency for various phonemes that can come at the beginning of a command phrase. VAD systems need to reject ambient noise yet respond quickly to valid voice input. Dolphin has developed the MiWok benchmarking platform to allow designers to compare key metrics.
Some systems use analog microphones, which means that most of the system can be in sleep mode, with only a small IP, such as the Dolphin WhisperTrigger, active to detect valid voice input. Other systems use digital microphones, which necessarily require more supporting circuitry, in addition to the WhisperTrigger IP, to remain in wake mode so the microphone input can be converted to a usable signal. The Dolphin white paper describes each type of system and their tradeoffs.
Regardless, their analysis shows that adding the WhisperTrigger IP to a voice activated system allows for significant power reductions, versus maintaining DSPs in an on state to analyze incoming audio data. The Dolphin WhisperTrigger IP offers extensive configurability to let designers fine tune sensitivity and performance for the specific application.
The white paper offers benchmark comparisons to help illustrate the alternatives available and their overall power consumption figures. If you don’t want the users of your system to feel like they are talking to themselves in the mirror, it might be worth reading the white paper to understand the options available for power efficient and reliable VAD system design. The white paper is located on the Dolphin Design Website for download and reading.