Early last year we talked about state space models, a recent advance over large language modeling with some appealing advantages. In this blog we introduce neurosymbolic methods, another advance in foundation technologies, here applied to automated code generation. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and lecturer at Stanford, EE292A) and I continue our series on research ideas. As always, feedback welcome.
The Innovation
This month’s pick is Neural Program Generation Modulo Static Analysis. The authors are from Rice University, UT Austin and the University of Wisconsin. The paper was published in Neural Information Processing Systems 2021 and has 27 citations.
Amazing as they are, LLMs and neural networks have limitations becoming apparent in “almost correct” code generation from CoPilot, the underwhelming release of GPT5 and other examples. Unsurprising. Neural nets excel at perception-centric learning from low level detail, but they do not excel at reasoning based on prior knowledge, an area where symbolic methods fare better. Neurosymbolic methods fuse the two approaches to leverage complementary strengths. Neurosymbolic research and applications are not new but have been overshadowed by LLM advances until recently.
This month’s blog uses a neurosymbolic approach to improve accuracy in automated software generation for Java methods.
Paul’s view
LLM-based coding assistants are rapidly becoming mainstream. These assistants rely mostly on their underlying LLM’s ability to generate code from a user text prompt and surrounding other code already written. Under the hood, these LLMs are “next word” predictors that write code one word at a time, beginning with the prompt as their input and then consecutively appending each word generated so far to the prompt to form a successor prompt which is then used to generate the next word and so on.
This paper observes that unlike natural language, all programming languages must conform to a formal grammar (search “BNF grammar” in your browser). These grammars map source code into a “syntax tree” structure. It’s entirely possible to make a neural network that is a syntax tree generator rather than a word generator. Such a network recursively calls itself to build a syntax tree in a left-to-right depth-first search approach.
The authors further propose to annotate nodes in a syntax tree with a “symbol table” containing all declared variables and their types from surrounding code already written and portion of the syntax tree generated so far. The symbol table is created by a traditional non-AI algorithm as would be done by a software compiler and is used during training of the network as a weak supervisor – code generated that assigns variables or function arguments in violation of the symbol table are labeled as bad code.
The authors train and benchmark a 60M parameter “neurosymbolic grammar” (NSG) syntax tree-based code generator for Java code generation. They use a large database of java classes with 1.6M methods total, randomly removing the body of one method from a class, and asking their NSG to re-generate code for that method based only on the surrounding code for the rest of the class, including its comments. They compare NSG to a variety of baseline LLMs from 125M to 1.3B parameters using a combination of syntax correctness checks and checks for similarity to the golden method used for training. NSG is a lot better: 86% of NSG generated code passes all syntax correctness checks vs. 67% from the best alternative (CodeGPT 125M parameters), and NSG generated code has a 40% average similarity score to golden vs. 22% from the best alternative (GPTNeo 1.3B parameters).
Of course, with today’s 100B+ parameter LLMs using multi-shot reasoning, which can include feeding software compiler errors back to the LLM and asking it to fix them, the benchmark results could prove less compelling. As the authors themselves point out in this paper, more research here would be welcome!
Raúl’s view
Neuro-symbolic approaches in artificial intelligence combine neural methods such as deep learning with symbolic reasoning based on formal languages and logic. The goal is to overcome the weaknesses of both approaches: neural networks excel at pattern recognition from large datasets but cannot easily take advantage of coded expert knowledge and are “black boxes” that make understanding their decision-making processes hard; symbolic systems can encode precise rules and constraints, but they are brittle and hard to scale. The first paper in the trio we blog about this week gives a brief introduction to this topic.
The monograph “Neurosymbolic Programming” more specifically addresses integrating deep learning and program synthesis. Strategies include neural program synthesis, where neural networks are trained to generate programs directly; learning to specify in which models learn to complete or disambiguate incomplete specifications; neural relaxations aim at using the parameters of a neural network to approximately represent a set of programs; and distillation where trained neural networks are converted back into symbolic programs approximating their behavior.
Against this backdrop, the NeurIPS 2021 paper Neural Program Generation Modulo Static Analysis presents one specific approach to program (Java methods) generation using a neuro-symbolic approach. The authors argue that large language models of code (e.g., GPT-Neo, Codex) often fail to produce semantically valid long-form code such as full method bodies, and contain basic errors such as uninitialized variables, type mismatches, and invalid method calls. The key thesis of the paper is that static program analysis provides semantic relationships “for free” that are otherwise very hard for neural networks to infer. The paper is self-contained but assumes knowledge of compilation of formal languages and neural models to fully understand the approach.
The models built, Neurosymbolic Attribute Grammars (NSGs), extend context-free grammars with attributes derived from static analysis such as symbol tables, type information, and scoping. During generation, the neural model chooses not only on syntactic context but also based on these semantic attributes (“weak supervision”). This hybrid system improves the model’s ability to respect language rules while still benefiting from statistical learning.
The system was evaluated on the task of generating Java method bodies. Training used 1.57 million Java methods, with a grammar supporting a subset of Java. The NSG model itself had 63 million parameters, which is modest compared to billion-parameter transformers like GPT-Neo and Codex. NSGs substantially outperform these larger baselines on static checks (ensuring no undeclared variables, type safety, initialization, etc.) and fidelity measures (similarity of generated code to ground truth, Abstract Syntax Tree (AST) structure, execution paths). For example, NSGs achieved 86% of generated methods passing all static checks, compared to ~65% for GPT-Neo and ~68% for CodeGPT. On fidelity metrics, NSGs nearly doubled the performance of transformers, showing they not only generate valid code but also code that more closely matches intended behavior.
This work illustrates the power of neuro-symbolic methods in generating programming languages where semantics matter deeply; unlike natural language, code is governed by strict syntactic and semantic rules. Verification and generation of digital systems, e.g., (System)Verilog, obviously benefit from such techniques.
Also Read:
Cadence’s Strategic Leap: Acquiring Hexagon’s Design & Engineering Business
Cocotb for Verification. Innovation in Verification
A Big Step Forward to Limit AI Power Demand
Share this post via:
Comments
There are no comments yet.
You must register or log in to view/post comments.