Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

I Have Seen the Future with ChipAgents Autonomous Root Cause Analysis
Where is the link to the demo?

— horace on November 21, 2025
AI RTL Generation versus AI RTL Verification
Too soon to tell I would say. They are all doing interesting work, not clear anyone yet has breakout ideas

— Bernard Murphy on November 16, 2025
EDA Has a Value Capture Problem — An Outsider’s View
Hi Peter, you are absolutely arguing with the right person! I am the author. Two responses: 1. "Large buyers often…

— ly on November 15, 2025
EDA Has a Value Capture Problem — An Outsider’s View
So let's get this straight - the author states that EDA companies carry on hundreds of negotiations each year (true),…

— Peter Bennet on November 15, 2025
Think Quantum Computing is Hype? Mastercard Begs to Disagree
Excellent question Fred. I'm planning a series of blogs on this topic, including a discussion on benchmarking. Saty tuned!

— Bernard Murphy on November 15, 2025
EDA Has a Value Capture Problem — An Outsider’s View
How hard is it to vibe code an open source simulator and synthesizer? The tools used by software engineers are…

— horace on November 15, 2025
Think Quantum Computing is Hype? Mastercard Begs to Disagree
PQC is important but who runs benchmarks?

— Fred Chen on November 14, 2025
AI RTL Generation versus AI RTL Verification
"IDEs may play a part but the bulk of the innovation here is around GenAI examining regression results to determine…

— Debamitro Chakraborti on November 12, 2025
Adding Expertise to GenAI: An Insightful Study on Fine-Tuning
Debamitro - thanks for the huggingface link. A question over time will be what level of quality control is imposed…

— Bernard Murphy on November 12, 2025
Adding Expertise to GenAI: An Insightful Study on Fine-Tuning
It will be interesting to see if fine-tuning suffices for the needs of RTL design/verification. Or will we need to…

— Debamitro Chakraborti on November 12, 2025

Banner Electrical Verification The invisible bottleneck in IC design updated 1

WP_Term Object
(
    [term_id] => 6435
    [name] => AI
    [slug] => artificial-intelligence
    [term_group] => 0
    [term_taxonomy_id] => 6435
    [taxonomy] => category
    [description] => Artificial Intelligence
    [parent] => 0
    [count] => 737
    [filter] => raw
    [cat_ID] => 6435
    [category_count] => 737
    [category_description] => Artificial Intelligence
    [cat_name] => AI
    [category_nicename] => artificial-intelligence
    [category_parent] => 0
)

April 16, 2025May 20, 2025 by Bernard Murphy

A Perspective on AI Opportunities in Software Engineering

A Perspective on AI Opportunities in Software Engineering
by Bernard Murphy on 04-16-2025 at 6:00 am
Categories: AI

Key Takeaways

AI-driven software engineering could have significant commonalities with hardware engineering.
The use of LLMs for code generation emphasizes RAG (retrieval augmented generation) for finding code snippets.
AI agents could help in eliciting clear requirements specifications and validating them.

Whatever software engineering teams are considering around leveraging AI in their development cycles should be of interest to us in hardware engineering. Not in every respect perhaps but there should be significant commonalities. I found a recent paper on the Future of AI-Driven Software Engineering from the University of Aukland, NZ with some intriguing ideas I thought worth sharing. The intent of these authors is to summarize high-level ideas rather than algorithms, though there are abundant references to papers which on a sample review do get into more detail. As this is a fairly long paper, here I just cherry-pick a few concepts that stood out for me.

Upsides

In using LLMs for code generation the authors see increased emphasis on RAG (retrieval augmented generation) for finding code snippets versus direct code synthesis from scratch. They also share an important finding in a blog post from StackOverflow, reporting that blog hits on their website are declining. This is significant since StackOverflow has been a very popular source for exchanging ideas and code snippets. StackOverflow attribute the decline to LLMs like GPT4 both summarizing a response to a user prompt and directly providing code snippets. Such RAG-based systems commonly offer links to retrieved sources but clearly these are not compelling enough to keep up website hits. I find the same experience with Google search, where now search results often start with an AI-generated overview. I often (not always) find this useful, also I often don’t follow the links.

Meanwhile Microsoft reports that the GitHub CoPilot (a Microsoft product) paid customer base is growing 30% quarter on quarter, now at 1.3M developers in 50K organizations. Clearly for software development the ease of generating code through CoPilot has enough appeal to extract money from subscribers.

Backing up a step, before you can write code you need a clear requirements specification. Building such a specification can be a source of many problems, mapping from a client’s mental image of needs to an implementer’s image in natural language, with ambiguities, holes and the common reality of an evolving definition. AI-agents could play a big role here by interactively eliciting requirements, proposing examples and scenarios to help resolve ambiguities and plug holes. Agents can also provide some level of requirements validation, by identifying vague or conflicting requirements.

Maintaining detailed product documentation as development progresses can be a huge burden on developers and that documentation can easily drift out of sync with the implemented reality especially through incremental changes and bug fixes. The authors suggest this tedious task could be better handled through agent-based generation and updates, able to stay in sync with every large or small change. Along similar lines, not everyone in the product hierarchy will want detailed implementation doc. Product managers, AEs, application developers, and clients all need abstracted views best suited to their individual interests. Here also there is opportunity for LLMs to generate such abstractions.

Downsides

The obvious concern with AI generated code or tests is the hallucination problem. While accuracy will no doubt improve with further training, it is unrealistic to expect high certainty responses to every possible prompt. Hallucinations are more a feature than a bug, no matter how extensive the training.

Another problem is over-reliance on AI. As developers depend more on AI-assisted answers to their needs, there is a real concern that their problem-solving and critical thinking skills will decline over time. Without expert human cross checks, how do we ensure that AI-induced errors do not leak through to production? A common response is that the rise of calculators didn’t lead to innumeracy, they simply made us more effective. By implication AI will reach that same level of trust in time. Unfortunately, this is a false equivalence. Modern calculators produce correct answers every time; there is no indication that AI can rise to this level of certainty. If engineers lose the ability to spot errors in AI claims for such cases, quality will decline noticeably, even disastrously. (I should stress that I am very much a proponent of AI for many applications. I am drawing a line here for unsupervised AI used for applications requiring engineering precision.)

A third problem will arise as more of the code used in training and RAG is itself generated by AI. The “genotype” of this codebase will fail to weed out weak/incorrect suggestions unless some kind of Darwinian stimulus is added to the mix. Reinforcement based learning could be a part of the answer to improve training, but this won’t fix stagnation in RAG evolution. Worse yet, experts won’t be motivated to add new ideas (and where would they add them?) if recognition for their contribution will be hidden behind an LLM response. I didn’t see an answer to this challenge in the paper.

Mitigating Downsides

The paper underlines the need for unit testing unconnected to AI. This is basic testing hygiene – don’t have the same person (or AI) both develop and test. I was surprised that there was no mention of connecting requirements capture to testing since those requirements should provide independent oracles for correct behavior. Perhaps that is because AI involvement in requirements capture is still mostly aspirational.

One encouraging idea is to lean more heavily on metamorphic testing, something I have discussed elsewhere. Metamorphic testing checks relationships in behavior which should be invariant through low-level changes in implementation or in use-case tests. If you detect differences in such a relation during testing, you know you have an error in the design. However finding metamorphic relations is not easy. The authors suggest that AI could uncover new relations, as long as each such suggestion is carefully reviewed by an expert. Here the expert must ask if an apparent invariant is just an accident of the testing or something that really is an invariant, at least in the scope of usage intended for the product.

Thought-provoking ideas, all with relevance to hardware design.

Also Read:

The Journey of Interface Protocols: Adoption and Validation of Interface Protocols – Part 2 of 2

EDA AI agents will come in three waves and usher us into the next era of electronic design

Beyond the Memory Wall: Unleashing Bandwidth and Crushing Latency

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.

I Have Seen the Future with ChipAgents Autonomous Root Cause Analysis
Where is the link to the demo?

— horace on November 21, 2025
AI RTL Generation versus AI RTL Verification
Too soon to tell I would say. They are all doing interesting work, not clear anyone yet has breakout ideas

— Bernard Murphy on November 16, 2025
EDA Has a Value Capture Problem — An Outsider’s View
Hi Peter, you are absolutely arguing with the right person! I am the author. Two responses: 1. "Large buyers often…

— ly on November 15, 2025
EDA Has a Value Capture Problem — An Outsider’s View
So let's get this straight - the author states that EDA companies carry on hundreds of negotiations each year (true),…

— Peter Bennet on November 15, 2025
Think Quantum Computing is Hype? Mastercard Begs to Disagree
Excellent question Fred. I'm planning a series of blogs on this topic, including a discussion on benchmarking. Saty tuned!

— Bernard Murphy on November 15, 2025
EDA Has a Value Capture Problem — An Outsider’s View
How hard is it to vibe code an open source simulator and synthesizer? The tools used by software engineers are…

— horace on November 15, 2025
Think Quantum Computing is Hype? Mastercard Begs to Disagree
PQC is important but who runs benchmarks?

— Fred Chen on November 14, 2025
AI RTL Generation versus AI RTL Verification
"IDEs may play a part but the bulk of the innovation here is around GenAI examining regression results to determine…

— Debamitro Chakraborti on November 12, 2025

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Key Takeaways

Upsides

Downsides

Mitigating Downsides

Also Read:

Comments

Recent Forum Threads

Recent Article Comments