WP_Term Object
(
    [term_id] => 26972
    [name] => Moores Lab (AI)
    [slug] => moores-lab-ai
    [term_group] => 0
    [term_taxonomy_id] => 26972
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 3
    [filter] => raw
    [cat_ID] => 26972
    [category_count] => 3
    [category_description] => 
    [cat_name] => Moores Lab (AI)
    [category_nicename] => moores-lab-ai
    [category_parent] => 157
)
            
Moore's Lab AI SemiWiki
WP_Term Object
(
    [term_id] => 26972
    [name] => Moores Lab (AI)
    [slug] => moores-lab-ai
    [term_group] => 0
    [term_taxonomy_id] => 26972
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 3
    [filter] => raw
    [cat_ID] => 26972
    [category_count] => 3
    [category_description] => 
    [cat_name] => Moores Lab (AI)
    [category_nicename] => moores-lab-ai
    [category_parent] => 157
)

We Need to Turn Specs into Oracles for Agentic Verification

We Need to Turn Specs into Oracles for Agentic Verification
by Bernard Murphy on 12-03-2025 at 6:00 am

The natural language understanding now possible in LLMs has raised interest in using specs as a direct reference for test generation, to eliminate need for intermediate and fallible human translation. Sadly, specs today are not an infallible source of truth for multiple reasons. I am grateful to Shelly Henry (CEO of MooresLab) for his insights into the realities of spec evolution in production settings. Shelly and his team have many years of design experience across several enterprises, most recently as alumni of the Microsoft Silicon Group.

We Need to Turn Specs into Oracles for Agentic Verification

Today’s spec as an oracle – you wish

Architecture specs go through a development cycle, as do all aspects of design and verification, and those specs are not perfect on first pass just as is the case for other deliverables in the design flow. An architect is responsible for building the specification, starting from customer requirements, considering what can be leveraged from other projects and what must be redesigned or upgraded to meet those new specs. The architect may be able to rely on a modeling group to do some virtual prototyping, testing for throughput, latencies, other metrics. Once they feel they feel their rough model looks good they will start writing their spec. In what follows I’ll focus on the spec as guidance for hardware design verification though it equally should guide hardware and firmware design.

Working on the spec you need to start test planning will be the architect’s primary focus for a while, but not their only focus as they continue to manage other tasks already in their pipeline. Their first release may be a 0.5 version, covering perhaps 70% of what they considered up to this point. Again, a decent representation but not guaranteed to be perfect. Good enough to start design and verification schedule and resources.

Over time they will add to and refine the spec based on their own ideas, feedback from the customer and from you. Eventually the spec is frozen (Shelly suggests around halfway into the design schedule, though your mileage may differ). Within that window, between the 0.5 release and freeze, the spec is changing. There may be contradictions or missing information. There may also be ambiguities: the spec defines a feature but leaves too much open for you to be certain about expected behavior in all cases.

You email the architect for clarification. That turns into a thread, and you eventually agree on a resolution. But this outcome doesn’t always get back into the spec, or maybe it does but not fully reflecting the agreement you thought you had. Worse yet, you call the architect, agree on a resolution for which you make a note – somewhere. It’s easy to see how mistakes can happen despite good intentions all round. Unfortunately, there is no verification methodology to definitively prove that a spec fully reflects the expectations of all stakeholders. Perhaps disconnects will surface pre-silicon, perhaps not. Is this really the best that we can do?

How we could turn a spec into a robust oracle

Start with what we already can do. Input the 0.5 spec into an LLM-based agent and have that agent generate questions to the LLM to elaborate verification requirements based on know-how already captured in that LLM model. What are the standard types of tests that should be performed around a DDR interface in this class of designs for example?

There’s no need to digest a full spec in one gulp, likely impossible anyway given the bounded prompt windows that LLMs support. Specs are naturally organized by chapters and sections to respect the limited abilities of us fallible humans, much more amenable to LLM processing.

Agent questions shouldn’t ask how such tests should be performed – that is the concern of later test synthesis flows. Here we want to refine the test specification to add more descriptive detail around what behavior is expected. The detail the architect or you would have added to the spec if time and fallible memories allowed. Very likely this may involve timing diagrams, maybe FSM diagrams, block diagrams to elaborate clock and reset control, or how domain crossings are handled.

As the spec evolves, the agent should be able to digest mail threads, DM threads, notes, and use that information for further refinement. Ensuring a central source of truth, while also clarifying where changes originated and what they impacted, by revision. Making it much easier for stakeholders to review and mutually agree that this refined version fully reflects what they wanted.

Turning a spec into an oracle is an essential first step in an agentic verification flow. Filling in holes, correcting inconsistencies, resolving ambiguities and testing that the spec itself provides enough detail to drive comprehensive test generation. This seems to me to be a no-brainer. If you’re curious, you might want to talk to the folks at MooresLab.ai.

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.