Assertion based verification is a very productive way to catch bugs, however assertions are hard enough to write that assertion-based coverage is not as extensive as it could be. Is there a way to simplify developing assertions to aid in increasing that coverage? Paul Cunningham (Senior VP/GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and now Silvaco CTO) and I continue our series on research ideas. As always, feedback welcome.
This month’s pick is Towards Improving Verification Productivity with Circuit-Aware Translation of Natural Language to SystemVerilog Assertions. The paper was presented in the First International Workshop on Deep-Learning Aided Verification in 2023 (DAV 2023). The authors are from Stanford.
While a lot of attention is paid to LLMs for generating software or design code from scratch, this month’s focus is on generating assertions, in this case as an early view into what might be involved in such a task. The authors propose a framework to convert a natural language check into a well-formed assertion in the context of the target design which a designer can review and edit if needed. The framework also provides for formally checking the generated assertion, feeding back results to the designer for further refinement. The intent looks similar to prompt refinement in prompt-based chat models, augmented by verification.
As a very preliminary paper our goal this month is not to review and critique method and results but rather to stimulate discussion on the general merits of such an approach.
A short paper this month – more of an appetizer than a main course, but on a Michelin star topic: using LLMs to translate specs written in plain English into SystemVerilog assertions (SVA). The paper builds on earlier work by the authors using LLMs to translate specs in plain English into linear temporal logic (LTL), a very similar problem, see here.
The authors leverage a technique called “few shot learning” where an existing commercial LLM such as GPT or Codex is asked to do the LTL/SVA translation, but with some additional coaching in its input prompt: rather than asking the LLM to “translate the following sentence into temporal logic” the authors ask the LLM to “translate the following sentence into temporal logic, and remember that…” followed by a bunch of text that explains temporal logic syntax and gives some example translations of sentences into temporal logic.
The authors’ key contribution is to come up with the magic text to go after “remember that…”. A secondary contribution is a nice user interface to allow a human to supervise the translation process. This interface presents the user with a dialog box showing suggested translations of sub-clauses in the sentence and asks the user to confirm these sub-translations before building up the final overall temporal logic expression for the entire sentence. Multiple candidate sub-translations can be presented in a drop-down menu, with a confidence score for each candidate.
There are no results presented in the SVA paper, but the LTL paper shows results on 36 “challenging” translations provided by industry experts. Prior art correctly translates only 2 of the 36, where the authors’ approach succeeds on 21 of 36 without any user supervision and on 31 of 36 with user supervision. Nice!
The proposed framework, nl2sva, “ultimately aims to utilize current advances in deep learning to improve verification productivity by automatically providing circuit-aware translations to SystemVerilog Assertions (SVA)”. It is done by extending a recently released tool, nl2spec, to interactively translate natural language to temporal logic (SVA are based on temporal logic). The framework requires an LLM (they use GTP-4) and a Model checker (they use JasperGold). The LLM reads the circuits in System Verilog and the assertions in natural language, and generates SVAs plus circuit meta information (e.g., module names, input and output wire names) and sub-translations in natural language. These are presented to the developer to evaluate and the SVA are run through a model checker to evaluate. The authors describe how the framework is trained (few-shot prompting) and include two complete toy examples (Verilog listings) and show a correctly generated SVA for each of them (“Unless reset, the output signal is assigned to the last input signal”).
As pointed out, this is preliminary work. Using AI to generate assertions seems a worthy enterprise. It is a hard problem in the sense that it involves translation; we briefly hit translation back in July when reviewing Automatic Code Review using AI; translation is a hard problem to score, often done with the BLEU score (bilingual evaluation understudy which evaluates quality of machine translation on a scale of 0-100) involving human evaluation. The authors use GPT-4 stating that they have “up to 176 billion parameters” and “supports up to 8192 tokens of context memory”, which is limiting. Using GPT-5 (1.76 trillion parameters, not clear why they quote only 8192 tokens) will remove these limits.
In any case, this is a really easy paper, with a paragraph long introduction to both SVA and to LLMs, with two complete toy examples – fun to read!