Moshe Sheier, VP Marketing at CEVA, recently got back from MWC Shanghai and commented that robots are clearly trending. He saw hordes of robots from dozens of companies, begging for someone to brand and offer them in any one of many possible applications: in an airport to guide you to a connecting flight, for elder care, in hospitals for food and drug delivery, in education for learning about robotics and programming but also as assistants in dealing with special needs kids, food delivery in restaurants, the list is endless. Think of this as the next big thing after smart speakers (Amazon already has 100k+ robots working in their warehouses, so obviously they’re working on home robots as a sequel to the Echo).
Which Moshe said made him think about what it will take to offer competitive robot solutions. He pointed me to Gartner’s list of the top 10 AI and sensing capabilities they believe will be needed in personal assistant robots by 2020, among which they (Gartner) include computer vision, a conversational user interface, biometric recognition / authentication, acoustic scenery analysis, location sensing, autonomous movement and of course local (edge) AI.
Why is it so important for all of this to be available in a robot? Why not let the cloud do the heavy lifting? There may be a scalability problem in that concept, but also we’re starting to get wise to why the cloud isn’t the answer to every need. Latency is an issue – if you want a quick response you can’t wait for a round-trip and possibly delay in getting a resource to do the work. Privacy/security is a big concern. Do you want you’re your medical symptoms or payment details exposed to eavesdropping hacks? Power is always a concern – robots aren’t much use when they’re parked at a power outlet. Having to go to the cloud and back burns significant power in communication. It often makes sense to do as much compute as possible locally, as counter-intuitive as that may seem.
Take computer vision – move it to the edge. But you have to be careful; dropping the cloud-based solution into a robot probably won’t work. You could handle vision on a leading GPU – positioning, tracking and gesture recognition are examples. Add more intelligence and the robot can find objects and people. But a big GPU used for graphics, intelligent vision and deep learning will be a real power hog. Not a problem in the cloud but a real issue on the edge. Offloading some of these tasks, particularly vision and a lot of recognition onto DSPs is a natural step since DSPs have a well-recognized performance per watt advantage over GPUs.
Autonomous movement requires ability to recognize and avoid objects which, unless the robot has to relearn object positions over and over again (slow and power hungry), requires an ability to build a 3D map of a room or floor of a building. Naturally this should be updated as objects move but that should only need incremental refinement. This again highlights the accelerating trend to move AI to the edge. Learning is typically thought of as a cloud-based activity, where trained networks are downloaded to the edge. But 3D-mapping and ongoing refinement can’t depend on cloud support (sorry I knocked the lamp over – I was waiting for a training update from the cloud?).
Acoustic scene analysis is a hot topic these days, extracting significant sounds or speakers from an acoustically busy background. The family is in the living room chatting away, the TV’s on and you want to ask your robot to answer a question. How does the robot figure out it’s being asked to do something and who asked? Or you’re away from the house and an burglar breaks a window or your dog starts barking. Can the robot understand there’s cause for concern?
This has to start with acoustic scene analysis – it doesn’t make sense to ship an unedited audio stream to the cloud and have that figure out what to do. A lot of intelligent processing can happen before you get into command recognition and even natural language processing (NLP). Separating sources, recognizing sounds like breaking glass and your dog barking, also keyword and some targeted command recognition, these can be processed locally today. General-purpose NLP will likely be a cloud (and continuing research) function for quite a while, but domain-specific NLP shows promise to be supported locally in the not too distant future.
So when you’re thinking about building that robot and you want to differentiate not just on features but also time to market and usability – a lot of the hard work already done for you and much longer uptimes between charges – you might want to check out CEVA’s offerings, in their platform for local AI, in front-end voice processing and in deep learning in the IoT.