Neural nets as described in many recent articles are very capable at recognizing objects and written and spoken text. But like anything we can build, or even imagine, they have limitations. One problem is that after training, the neural nets we usually encounter are essentially stateless. They can recognize static patterns but not pattern sequences and they can’t advance their learning without being retrained.
Time sequence patterns are important because that’s where semantic understanding has to start. You cannot claim a system has understanding unless it can make inferences from previously-supplied information. For example, given “Bilbo took the ring. Bilbo went back to the Shire. Bilbo left the ring there”, then answer “Where is the ring?”.
One way to address this limitation is to use recurrent neural nets (RNNs) in which perceptrons support feedback. These can make learning part of the training process, so what is memorized is embedded in the net itself, but RNNs tend to have limited and, in simpler implementations, very short-term memory. Another way is to use Memory Neural Networks (MemoryNN’s) which use associative memory in combination with a neural net. Facebook is very active in research and publications in this area (which may come as a surprise to those of you who think Facebook mostly worries about optimizing cat videos).
The MemoryNN approach at Facebook isn’t quite as simple as storing previous sentences. What is stored is a reduced vector of characteristics to enable and simplify comparisons on essential features. A lookup is then a closest match comparison on a requested feature set. FB calls these feature sets “feature vectors”. (I would imagine deciding what are the best essential features and how to grade object features on those scales then becomes a major topic its own right.)
There are several interesting challenges in modelling and matching feature vectors. One is that even with associative memories, we tend to think of exact matches per feature, but it is often more useful to also allow close matches (e.g. synonyms in text). A second interesting dimension is when in the timeline the information was stored. If, following the Lord of the Rings example above, Frodo subsequently picked up the ring, went to Mount Doom and dropped the ring there, the (current) answer to “where is the ring?” should be Mount Doom. But an answer to “Where did Bilbo leave the ring?” would still be the Shire.
A third challenge is to model unknown words, often (but not necessarily) proper names. One way Facebook deals with this is to model the context in which the word appears, determine what known words appear in that context and assume the new word is similar to those words (e.g. it is inferred to be a noun). The Facebook paper below talks about methods to address each of these needs.
MemoryNN’s are not restricted to text-based tasks. They can be useful in any objective where learning-on-the-fly can improve accuracy. For pictures, Visual Q&A (another Facebook capability; they have a demo – check it out) can answer questions about what is in a picture. You could imagine this being very useful in text/voice-based feature searches on image libraries (maybe find all pictures containing a dog). And MemoryNN’s can be particularly helpful in self-training. AlphaGo (the Google Go-player) uses MemoryNN’s to self-train on reasonable next moves from the current position in Go. Self-training is a very active area of research given the often high level of investment required in human-directed training for neural nets.
MemoryNN’s look like a major evolution in deep reasoning. Certainly Yann LeCun who runs Facebook AI Research thinks so. It’s also interesting to think about what is driving AI at Facebook and Google. They’re working on very similar areas and very similar goals – this competition should drive rapid advances in what we will be able to do. You can go through Yann’s slides on this topic and more HERE and you can read a Facebook paper on Memory Neural Networks HERE.