Whatever software engineering teams are considering around leveraging AI in their development cycles should be of interest to us in hardware engineering. Not in every respect perhaps but there should be significant commonalities. I found a recent paper on the Future of AI-Driven Software Engineering from the University of Aukland, NZ with some intriguing ideas I thought worth sharing. The intent of these authors is to summarize high-level ideas rather than algorithms, though there are abundant references to papers which on a sample review do get into more detail. As this is a fairly long paper, here I just cherry-pick a few concepts that stood out for me.
Upsides
In using LLMs for code generation the authors see increased emphasis on RAG (retrieval augmented generation) for finding code snippets versus direct code synthesis from scratch. They also share an important finding in a blog post from StackOverflow, reporting that blog hits on their website are declining. This is significant since StackOverflow has been a very popular source for exchanging ideas and code snippets. StackOverflow attribute the decline to LLMs like GPT4 both summarizing a response to a user prompt and directly providing code snippets. Such RAG-based systems commonly offer links to retrieved sources but clearly these are not compelling enough to keep up website hits. I find the same experience with Google search, where now search results often start with an AI-generated overview. I often (not always) find this useful, also I often don’t follow the links.
Meanwhile Microsoft reports that the GitHub CoPilot (a Microsoft product) paid customer base is growing 30% quarter on quarter, now at 1.3M developers in 50K organizations. Clearly for software development the ease of generating code through CoPilot has enough appeal to extract money from subscribers.
Backing up a step, before you can write code you need a clear requirements specification. Building such a specification can be a source of many problems, mapping from a client’s mental image of needs to an implementer’s image in natural language, with ambiguities, holes and the common reality of an evolving definition. AI-agents could play a big role here by interactively eliciting requirements, proposing examples and scenarios to help resolve ambiguities and plug holes. Agents can also provide some level of requirements validation, by identifying vague or conflicting requirements.
Maintaining detailed product documentation as development progresses can be a huge burden on developers and that documentation can easily drift out of sync with the implemented reality especially through incremental changes and bug fixes. The authors suggest this tedious task could be better handled through agent-based generation and updates, able to stay in sync with every large or small change. Along similar lines, not everyone in the product hierarchy will want detailed implementation doc. Product managers, AEs, application developers, and clients all need abstracted views best suited to their individual interests. Here also there is opportunity for LLMs to generate such abstractions.
Downsides
The obvious concern with AI generated code or tests is the hallucination problem. While accuracy will no doubt improve with further training, it is unrealistic to expect high certainty responses to every possible prompt. Hallucinations are more a feature than a bug, no matter how extensive the training.
Another problem is over-reliance on AI. As developers depend more on AI-assisted answers to their needs, there is a real concern that their problem-solving and critical thinking skills will decline over time. Without expert human cross checks, how do we ensure that AI-induced errors do not leak through to production? A common response is that the rise of calculators didn’t lead to innumeracy, they simply made us more effective. By implication AI will reach that same level of trust in time. Unfortunately, this is a false equivalence. Modern calculators produce correct answers every time; there is no indication that AI can rise to this level of certainty. If engineers lose the ability to spot errors in AI claims for such cases, quality will decline noticeably, even disastrously. (I should stress that I am very much a proponent of AI for many applications. I am drawing a line here for unsupervised AI used for applications requiring engineering precision.)
A third problem will arise as more of the code used in training and RAG is itself generated by AI. The “genotype” of this codebase will fail to weed out weak/incorrect suggestions unless some kind of Darwinian stimulus is added to the mix. Reinforcement based learning could be a part of the answer to improve training, but this won’t fix stagnation in RAG evolution. Worse yet, experts won’t be motivated to add new ideas (and where would they add them?) if recognition for their contribution will be hidden behind an LLM response. I didn’t see an answer to this challenge in the paper.
Mitigating Downsides
The paper underlines the need for unit testing unconnected to AI. This is basic testing hygiene – don’t have the same person (or AI) both develop and test. I was surprised that there was no mention of connecting requirements capture to testing since those requirements should provide independent oracles for correct behavior. Perhaps that is because AI involvement in requirements capture is still mostly aspirational.
One encouraging idea is to lean more heavily on metamorphic testing, something I have discussed elsewhere. Metamorphic testing checks relationships in behavior which should be invariant through low-level changes in implementation or in use-case tests. If you detect differences in such a relation during testing, you know you have an error in the design. However finding metamorphic relations is not easy. The authors suggest that AI could uncover new relations, as long as each such suggestion is carefully reviewed by an expert. Here the expert must ask if an apparent invariant is just an accident of the testing or something that really is an invariant, at least in the scope of usage intended for the product.
Thought-provoking ideas, all with relevance to hardware design.
Share this post via:
Achieving Seamless 1.6 Tbps Interoperability for High BW HPC AI/ML SoCs: A Technical Webinar with Samtec and Synopsys