RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)
WP_Term Object
(
    [term_id] => 6435
    [name] => AI
    [slug] => artificial-intelligence
    [term_group] => 0
    [term_taxonomy_id] => 6435
    [taxonomy] => category
    [description] => Artificial Intelligence
    [parent] => 0
    [count] => 798
    [filter] => raw
    [cat_ID] => 6435
    [category_count] => 798
    [category_description] => Artificial Intelligence
    [cat_name] => AI
    [category_nicename] => artificial-intelligence
    [category_parent] => 0
)

Trust in Verification with AI

Trust in Verification with AI
by Bernard Murphy on 03-24-2026 at 6:00 am

Key takeaways

These are stressful times in functional verification. We are being pushed to more aggressively embrace AI-based automation, knowing we will continue to be held accountable for quality of results. Verification misses could upend careers, maybe enterprises. It is tempting to believe that sanity will prevail and we will ultimately settle back into cautious AI adoption, but I am no longer so sure. Offset against verification risk is the very real chance that a competitor with a bigger risk appetite and a little luck will jump ahead and leave the rest of us wondering why customers are disappearing. We need to accept that we must reach further, exploiting our creativity to manage verification risk in uncharted waters. Trusting with confidence that agents on semi-autopilot will not steer us into a rock.

Trust in Verification with AI

Views from DVCon

I heard several opinions at DVCon. These come down to decomposing agentic flows into steps with checkpoints at which a human reviewer can easily check an agent’s work and correct as needed.

An example would be using an agent to read a spec/test spec for a specific function and from that generate PSS tests. A reasonably experienced DV engineer should be able to compare the relevant section of the spec and the generated PSS to check for correctness and completeness and iterate if needed. Finally, synthesize the PSS into UVM or other tests, run a simulation and score the run for coverage and assertion failures. Which may in turn trigger more iterations, steered by additional feedback from DV. There is evidence from a panel I moderated that, through enough iterations, it is possible to converge to engineering accuracy. Over time, fewer iterations are needed, growing trust.

Another consideration is that sometimes the right place to fix a problem is in the spec, not in agentic training. Here explainability becomes important – why did the agent do something wrong or unexpected? Human-generated specs can be incomplete, inconsistent, or incorrect in places (see this). Automating spec correction and refinement guided by human feedback is an important component in reinforcing trust.

Views from Software Engineering

I found several recent papers studying use of agentic methods in software engineering. One such paper from Monash U. in Australia and Columbia U. emphasizes that software engineering is a collaborative effort, from requirements gathering all the way to long-term maintenance and evolution. While AI agents should be able to perform some tasks autonomously, they must also conform to this collaborative model, allowing for evolving problem statements, tests, constraints and feedback.

The authors point out that trust is accumulated over time in sequences of interactions in which correctness and reliability are obviously fundamental. It’s OK for agents to be wrong in early stages, but correctness and reliability after sufficient training are essential.

Another paper from the National U of Singapore, CMU, and Stuttgart U. offers some interesting insights detailing technical considerations and human considerations for building trust. Some of their technical considerations are familiar in our context: tool-based methods to validate correctness, performance and compliance with general best practices.

Human factors are more interesting. Explainability and transparency require AI be able to justify why it made certain choices. Team practice compliance expects that agents adhere not just to general best practices but also more tightly with local team practices. The authors also suggest checks that explanations be matched appropriately to developer experience and that developers are not too blindly depending on agents (I assume through insufficient review and/or little correction, though I didn’t notice suggestions on this point).

Views from Davos

Don’t laugh. I agree in general that big consensus-driven organizations are poorly equipped to formulate policies in technologies moving much faster than they are able. However the World Economic Forum (WEF, who host the annual Davos meeting) bring together world leaders, business leaders and academics to discuss challenges and to formulate guidance rather than to regulate. An AI forum is a regular event now at Davos, and trust is now viewed as critical for encouraging worldwide AI growth.

Trust is a human condition which can’t be “fixed” with automation, but it can be fostered. WEF have published an article suggesting a “trust stack”. The first stack layer is “non-deceptive affect” meaning the agent should not try to gain trust through empathetic or praising cues or emotional appeals. A second is “epistemic humility”, a mouthful meaning that agents must communicate appropriate levels of uncertainty beside their claims, up to “I don’t know” where appropriate. Agents should also emphasize consistency over persuasion; answers should read as principled beliefs, not opinions of the moment. A fast way to destroy trust is to provide answers that change each time a question is asked.

There are more layers in the stack, but you get the idea. We want agents to act professionally and treat us as professionals, just as we would expect junior engineers to behave.

Takeaways

Trust with confidence in agentic flows is achievable but it doesn’t come free. We must adapt our behavior, to be appropriately wary of first-pass responses, to carefully review deliverables for issues, and to invest time in training agents through multiple cycles before we can consider them effective team members. Even then their output should remain subject to design review, just as for any human team member.

Some trust-centric improvements may require more detailed setup prompts to scope agent behavior (expert, professional, etc.) and to guard again unconsidered signoffs. Explainability with support for correction may be one of the most important factors in building trust, allowing us to detect and retrain where reasoning goes wrong. Today this is supported in some models through after-the-fact mechanisms, though I sense that much here is still in R&D.

Full autonomy may not be a reachable or even a desirable goal but there is certainly a path to significantly improve trust in stages, delivering improved productivity and shorter schedules. Ultimately semiconductor executives don’t expect miracles, but they do want to see significant improvement.

Also Read:

Podcast EP336: How Quadric is Enabling Dramatic Improvements in Edge AI with Veer Kheterpal

WEBINAR: HBM4E Advances Bandwidth Performance for AI Training

Siemens Fuse EDA AI Agent Releases to Orchestrate Agentic Semiconductor and PCB Design

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.