I was a the embedded vision conference last week. Jeff Beir, the founder of the embedded vision alliance gave an introduction to the field. The conference was much bigger than previous years and almost everyone is designing some sort of vision product. Half of your brain is used for vision so it goes without saying that vision requires a lot of computation. It is the highest bandwidth input channel for devices that need to interact with the physical world.
Vision has been an active research field for a decade but processor performance has now got high enough that it is going mainstream. Processors are up at well over 10GMAC/second (TI’s are at 25GMAC/second) but that threshold was only passed relatively recently. In the real world, vision is hard with varying lighting, glare, fog and other challenges.
Yann LeCun of Facebook showed some of his research on recognition. There has been a revolution in the algorithms in the last couple of years. Yann had a demo with a camera attached to a laptop with a big nVidia graphics engine in being used to run the algorithms. He could point the camera at things and it would tell you what they were (keyboard, space-bar, pastry, shoe and so on). In real time.
Chris Rowen of Cadence/Tensilica presented on Taming the Beast: Performance and energy optimization across embedded feature detection and tracking. This was largely about how to use the Tensilica Image/Video Processor (IVP) for doing things like recognition and gesture tracking. The big challenge is to do this with really low power.
The most important three things are:
I talked to Chris afterwards. He told me a bit more about the IVP. It has about 400 additional imaging instructions above the basic processor enabling new apps like that recognition. But power is the big challenge since we can’t put a full-blown nVidia graphics chip in our phones.
We are going to end up with a hierarchy of power levels with micropower for always-on to know when the rest of the system should wake up (listening or looking, for example). Then there is the system itself, think of your phone. Finally uploading data to the cloud. But some processing has to be done locally, it is too expensive in both power and delay to transfer a whole video (or speech waveform) to the cloud uncompressed. So even with the cloud the power problems do not go away.