Agentic methods are hot right now since single LLM models seem limited to point tool applications. Each such application is impressive but still a single step in the more complex chain of reasoning tasks we want to automate, where agentic methods should shine. I have been hearing that software engineering (SWE) teams are advancing faster in AI adoption than hardware teams so thought it would be useful to run a quick reality check on status. Getting into the spirit of this idea I used Gemini Deep Research to find sources for this article, selectively sampling a few surveys it offered while adding a couple of my own finds. My quick summary is first that what counts as progress depends on the application: convenience-based use-models are more within reach today, precision use-models are also possible but more bounded. And second, advances are more evident in automating subtasks subject to a natural framework of crosschecks and human monitoring, rather than a hands-free total SWE objective.
Automation for convenience
One intriguing paper suggests that we should move away from apps for convenience needs towards prompt-based queries to serve the same objectives. This approach can in principle do better than apps because prompt-based systems eliminate need for app development, can be controlled through the language we all speak without need for cryptic human-machine interfaces, and can more easily adapt to variations in needs.
Effective prompt engineering may still be more of an art than we would prefer, but the author suggests we can learn how to become more effective and (my interpretation) perhaps we only need to learn this skill once rather than for every unique app.
Even technology engineers need this kind of support, not in deep development or analysis but in routine yet important questions: “who else is using this feature, when was it most recently used, what problems have others seen?” Traditionally these might be answered by a help library or an in-house data management app, but what if you want to cross your question with other sources or constraints outside the scope of that app? In hardware development imagine the discovery power available if you could do prompt-based searches across all design data – spec, use cases, source code, logs, waveforms, revisions, etc, etc.
Automating precision development
This paper describes an agentic system to develop quite complex functions including a face recognition system, a chat-bot system, a face mask detection tool, a snake game, a calculator, and a Tic-Tac-Toe game, using an LLM-based agentic system with agents for management, code generation, optimization, QA, iterative refinement and final verification. It claims 85% or better code accuracy against a standard benchmark, building and testing these systems in minutes. At 85% accuracy, we must still follow that initial code with developer effort to verify and correct to production quality. But assuming this level of accuracy is repeatable, it is not hard to believe that even given a few weeks or months of developer testing and refinement, the net gain in productivity without loss of quality can be considerable.
Another paper points out that in SWE there is still a trust issue with automatically developed code. However they add that most large-scale software development is more about assembling code from multiple sources than developing code from scratch. Which changes the trust question to how much you can trust components and assembly. I’m guessing that they consider assembly in DevOps to be relatively trivial, but in hardware design SoC-level assembly (or even multi-die system assembly) is more complex though still primarily mechanical rather than creative. The scope for mistakes is certainly more limited than it would be in creating a complete new function from scratch. I know of an AI-based system from over a decade ago which could create most of the integration infrastructure for an SoC – clocking, reset, interrupt, bus fabric, etc. This was long before we’d heard of LLMs and agents.
Meanwhile, Agentic/Generative AI isn’t only useful for code development. Tools are appearing to automate test design, generation and execution, for debug, and more generally for DevOps. Many of these systems in effect crosscheck each other and are also complemented by human oversight. Mistakes might happen but perhaps no more so than in an AI-free system.
Convenience, precision or a bit of both?
Engineers obsess about precision, especially around AI. But much of what we do during our day doesn’t require precision. “Good enough” answers are OK if we can get them quickly. Search, summarizing key points from an email or paper, generating a first draft document, these are all areas where we depend on (or would like) the convenience of a quick and “good enough” first pass. On the other hand, precision is vital in some contexts. For financial transactions, jet engine modeling, logic simulation we want the most accurate answers possible, where “good enough” isn’t good enough.
Even so, there can still be an advantage for precision applications. If AI can provide a good enough starting point very quickly (minutes) and if we can manage our expectations by accepting need to refine and verify beyond that starting point, then the net benefit in shortened schedule and reduced effort may be worth the investment. As long as you can build trust in the quality the AI system can provide.
Incidentally, my own experience (I tried Deep Research (DR) options in Gemini, Perplexity and Chat GPT) backs up my conclusions. Each DR analysis appeared in ~10 minutes, mostly useful to me for the references they provided rather than the DR summaries. Some of these references were new to me, some I already knew. That might have been enough if my research was purely for my own interest. But I wanted to be more accurate since I’m aiming to provide reliable insight, so I also looked for other references through more conventional on-line libraries. Combining both methods proved to be productive!
Share this post via:
TSMC N3 Process Technology Wiki