Limits to Deep Reasoning in Vision

Limits to Deep Reasoning in Vision
by Bernard Murphy on 08-01-2016 at 7:00 am

If you are a regular reader, you’ll know I like to explore the boundaries of technology. Readers I respect  sometimes interpret this as a laughable attempt to oppose the inevitable march of progress, but that is not my purpose. In understanding the limits of a particular technology, it is possible to envision what properties a successor technology should have. And that to me seems more interesting than assuming all further progress in that direction will be no more than fine-tuning.

Take deep learning and vision. Recent progress in this direction has been quite astounding; in one example, systems have bested humans in identifying dog breeds. These systems are now used in cars for driver assistance and safety applications – detecting lane markings, collision hazards, even traffic signs. Increasingly Google and Facebook use image recognition to search and tag people, animals and objects in images. It seems we’ve almost conquered automated image recognition at a level better than humans. But have we really, and if so, is that good enough?

While progress in deep reasoning has been impressive, there have also been some fairly spectacular fails. Microsoft was forced to retire a chatbot after it developed racist and other unpleasant tendencies. Google had to remove the “gorilla” tag from its Photos app after complaints that it was identifying dark-skinned people as gorillas. And Google released open-source software which identifies surrealist collages of faces in what we would consider perfectly ordinary images (in fairness, Google was pushing the software to see what happened).

You could argue that this is just normal progression for technology. Perhaps once the bugs are worked out, these problems will be rare. But I am skeptical that solutions as they stand just need better training. Our own fallibility in image recognition should be a hint. It’s common to see faces and other images in complex irregular patterns if we stare at them for a while. This phenomenon is called pareidolia, a bias of the brain to see patterns, particularly faces in random images. I can’t imagine why deep reasoning should be immune from this problem; after all we modeled the method on human reasoning, so it would be surprising if it did not also inherit weaknesses in that approach. In fact the Google software that produced surrealist images is known to have this bias.

How good the recognition has to be may depend on the application, but clearly there is room for improvement and for some applications, the bar is going to be very high. More training might help, up to a point. So might more hidden layers, though apparently the value of adding layers drops off sharply after a relatively small number. Ultimately we have to acknowledge that the only straightforward way to fix deep reasoning problems is to try harder, which is not an encouraging place to start when you want to find breakthrough solutions.

Or perhaps we could go back to how we think. Most of us don’t instantly convert what we think we see into action. We consider multiple factors and we pass our conclusions through multiple filters. This is so apparent that we all know people who seem to lack these safeguards; we consider them socially-challenged (or worse). Now think of a cascade of neural nets where each net is trained in different ways. Deep learning methods for particle detection at the Large Hadron Collider (LHC) use similar methods, also combining different approaches – neural nets and binary decision trees – to weed out false positives. This alone might be a good start, with a first order goal to default to “I don’t know” when there is ambiguity in recognition.

Training more nets and other methods would be more expensive and the outcome may initially be more ambiguous than we might like. But maybe that’s an inescapable reality of improved recognition. Perhaps we should think of what we have today as hind-brain recognition – good for quick reaction (fight-or-flight) response but, like the hind-brain, not good at ultra-high-fidelity recognition where we might need improved tools.

I’m sure however this evolves the field will continue to be called deep learning, but that’s just a label. For one insight into limitations in existing architectures and newer methods, see HERE. You can see the Google surrealist art HERE.

More articles by Bernard…

0 Replies to “Limits to Deep Reasoning in Vision”

You must register or log in to view/post comments.