Understanding natural language is considered a hard problem in artificial intelligence. You could be forgiven for thinking this can’t be right – surely language recognition systems already have this problem mostly solved? If so, you might be confusing recognition with understanding – loosely, recognition is the phonology (for voice) and syntax part of the problem and understanding is the semantic part.
A lot of progress has been made in recognition and this is largely thanks to deep reasoning. Voice recognition is a natural for these methods – systems can be trained to recognize a voice or a range of voices then can, thanks to probabilistic weighting, recognize a pre-determined vocabulary with high accuracy. The same applies to text recognition trained for reading selected content (stories, web-content, etc).
The quality of recognition depends on a few things – a relevant vocabulary, a sufficient grammar and a method to resolve the ambiguities which are typical in natural language. A typical English speaker has a vocabulary of ~20k words – very manageable with a large-enough neural net, though most applications today work with a much smaller task-specific vocabulary (for example in voice commands for your car). Grammars on the other hand tend to be quite simple in most applications. They throw away most of what they see and look for a likely verb and object (assuming you are the subject) to decide what you want. There are much more capable systems like IBM’s Watson, but these have required massive investment to get to better recognition.
But now there’s a big assist to building equally capable systems, and that helps with the ambiguity problem. Google recently released Syntax Net (which runs on top of Tensor Flow) as an open-source syntax engine to recognize syntax structures in a text sentence. The release also includes an English language parser called Parsey McParseface identifying the syntax tree for a sentence, including relative clauses, and tagging parts of speech like nouns, verbs (including tense and mode), pronouns and more.
While the system works with text, it is also built on deep reasoning to handle ambiguity in sentence structure. An example given in the link below considers “Alice drove down the street in her car”. Sounds pretty simple to us, but a possible machine interpretation is that she drove down a street which is inside her car. Trained neural net processing helps resolve these ambiguities.
Based on training with carefully-labelled Washington Post newswire texts, the parser is able to come very close to human accuracy in structuring sentences. It doesn’t do quite as well with unlabeled text, especially web examples, showing there is still more research required in self-guided training.
Google’s goal in this release is to encourage wider research on the deeper problems in natural language understanding, for example completing parts of speech identification (identifying that this is the subject, not just a noun or pronoun) and the semantics. Syntax Net helps other researchers and commercial developers avoid needing to reinvent a solution to a solved problem (and presumably they can now be confident that Google will be sympathetic to fair-use claims for products based on this software :cool:).
A lot of the interesting semantic challenges revolve around ambiguity and context-awareness: “Everyone loves someone” (one fortunate person is loved by everyone or possibly many people are loved?) and “John kissed his wife and so did Tom” (Tom kissed John’s wife or his own wife?). These problems might also be amenable to deep reasoning (what is the most probable interpretation) but it’s not yet as clear how you would constrain training examples for specific applications.
Natural language processing is becoming a competitive frontier as personal assistant software and translation tools become more popular and as our expectation for accuracy in dictation continue to rise (who wouldn’t love to get rid of keyboards?). This is a domain worth watching. You can read more about the Google release HERE. And HERE is a Berkeley paper on training neural nets to recognize continuous speech with a 65k word lexicon.