WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 637
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 637
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)

An Agentic Formal Verifier. Innovation in Verification

An Agentic Formal Verifier. Innovation in Verification
by Bernard Murphy on 02-25-2026 at 6:00 am

Key takeaways

In a break from our academic-centric picks, here we look at an agentic verification flow developed within a semiconductor company. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and lecturer at Stanford, EE292A) and I continue our series on research ideas. As always, feedback welcome.

An Agentic Formal Verifier. Innovation in Verification

The Innovation

This month’s pick is Saarthi: The First AI Formal Verification Engineer. The authors are from Infineon. The paper was posted in March 2025 in arXiv and ranked as second-best paper in DVCon 2025.

Nice paper on using agentic methods for hands-free formal verification. The goal is to explore how much can be achieved with a semi/fully autonomous agentic formal flow: understand design specifications, create a verification plan, allocate tasks to several AI verification engineers, communicate with formal verification to prove properties. Then analyze counter examples, assess formal coverage, and improve by adding missing properties.

Credible effort.

Paul’s view

This paper from Infineon outlines a simple agentic workflow for formally verifying RTL. Commercial verification agents have evolved quite a bit since it was published a year ago, but it still makes for a nice read. The paper begins by walking the reader through prompt engineering concepts like feedback, refinement, chain-of-thought, as well as sampling-and-voting as a method to mitigate hallucinations. The authors make their workflow available on three different open-source frameworks – CrewAI, AutoGen, and LangGraph. The workflow creates a list of properties to prove in plain English (verification plan), then creates and refines System Verilog Assertions (SVAs) for each property, and then runs a commercial model checking tool (Jasper from Cadence) on those SVAs. Each step in the workflow is iterative with multi-shot LLM calls.

The authors benchmark their agent on a range of simple testcases, typically with around 10 properties generated per testcases. All the testcases are considered golden RTL (bug free). The authors benchmark code coverage for the properties generated and how many of these pass in formal. Even though the circuits are fairly simple, both proof rates and coverage average below 50%. Also, in real world situations the RTL is not known to be golden and one of the biggest challenges for agentic workflows is to determine if a proof failure is due to a bad property or a real bug in the design. That said, one year is a long time in AI, and where last year was a breakthrough year for agentic software coding, our view at Cadence is that this year looks set to be a breakthrough year for agentic RTL design and verification. I’m looking forward to several exciting new papers on the topic at upcoming conferences!

Raúl’s view

Launched in early 2024 by Cognition AI, Devin is an autonomous AI agent designed to handle end-to-end software development tasks. Devin created a strong initial buzz, after the dust settled it reportedly resolved about 14% of real-world GitHub issues on the SWE-bench unassisted (vastly outperforming previous AI models). This month’s blog is about Saarthi, “a similar fully autonomous AI formal verification engineer”. Saarthi uses multi-agent collaboration and tight integration with formal tools, enabling systematic property generation, proof and coverage analysis aligned with industrial verification practice.

Saarthi leverages recent work in multi-agent LLM frameworks, applying workflows such as planning, reflection, tool use, and human-in-the-loop to formal hardware verification. It integrates LLMs, a framework (in today’s LLM literature “the control, coordination, and tooling layer around one or more LLMs”) and a tool (an external capability, formal verification) to build “the first AI formal verification engineer”. Human-in-the-loop intervention remains essential to avoid infinite loops and resolve ambiguous or incorrect properties.

Saarthi is evaluated across a broad spectrum of RTL designs ranging from simple counters and FSMs to more complex structures such as FIFOs, pipelined adders, and an RV32I core, using three different LLMs. The good news is that end-to-end autonomous formal verification is achievable in a meaningful fraction of cases, success rates hover around 40–50%. Coverage metrics are consistently stronger than assertion proof rates, not surprising as the first few assertions tend to cover a disproportionately large fraction of the reachable state space and the number of properties is small (10-20).  Model choice matters significantly:  GPT-4o emerges as the most robust and consistent, Llama3-70B underperforms across nearly all metrics. Agentic workflows can mitigate, but not eliminate, underlying model weaknesses. The results support “AI-assisted verification under human supervision more than “hands-off AI verification”. From a practical perspective, this is still valuable; Saarthi is a credible systems-level demonstration of agentic AI applied to formal verification.

While they don’t fully achieve a completely autonomous engineering system, the authors convincingly demonstrate that modern LLMs in a carefully designed agentic workflow and human-in-the-loop control, can handle substantial parts of an industrial formal verification flow. At the same time, the results make clear that today’s LLMs remain probabilistic assistants rather than deterministic engines — success rates vary, retries are frequent, and human intervention remains necessary for successful outcomes. The paper is an important datapoint in the progress of AI-assisted EDA. As for the bold claim that “Artificial General Intelligence (AGI) by 2027 is strikingly plausible, and we believe that Saarthi could be pivotal in reaching that milestone within the hardware design verification domain”, only time will tell.

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.