Banner 800x100 0810
WP_Term Object
(
    [term_id] => 6435
    [name] => AI
    [slug] => artificial-intelligence
    [term_group] => 0
    [term_taxonomy_id] => 6435
    [taxonomy] => category
    [description] => Artificial Intelligence
    [parent] => 0
    [count] => 685
    [filter] => raw
    [cat_ID] => 6435
    [category_count] => 685
    [category_description] => Artificial Intelligence
    [cat_name] => AI
    [category_nicename] => artificial-intelligence
    [category_parent] => 0
)

Two Perspectives on Automated Code Generation

Two Perspectives on Automated Code Generation
by Bernard Murphy on 09-03-2025 at 6:00 am

In engineering development, automated code generation as a pair programming assistant is high on the list of targets for GenAI applications. For hardware design obvious targets would be to autogenerate custom RTL functions or variants on standard functions, or to complete RTL snippets as an aid to human-driven code generation. Research in autogeneration for software is much more active today than for hardware so take that as a starting point, noting that whatever is happening in software development should be a leading indicator for what we will likely see in hardware design. I have chosen two well-structured studies, one on CoPilot and one on an independent platform for collaborative assistance providing code completion through proactive prediction. Both studies add insights on effectiveness, human factors and who might best profit from this assistance.

pair programming, Automated Code Generation, ai

The CoPilot study

This paper is a couple years old (2023) but presumably not greatly out of date. The study looks at how well CoPilot performs in developing code for a set of fundamental CS programming objectives such as sorting and searching. The authors assess on multiple metrics: correctness and performance, and diversity versus reproducibility of solutions. They compare using similar metrics against code developed by a team of CS undergraduates against the same objectives, looking particularly at effort required to bring a buggy solution (CoPilot or human) to correctness.

They find that on some of the tasks CoPilot bested the students slightly or significantly, but in other cases either completely failed on complex tasks requiring multiple steps or failed to reach the student average for correctness over 10 attempts. Overall, students averaged better than CoPilot though there are indications that explicit step-based prompting improved CoPilot performance.

The authors also observe that repair rate for buggy solutions is better for CoPilot than for student code, finding that defects in CoPilot solutions were limited and localized. They conclude: “if Copilot as a pair programmer in a software project suggests a buggy solution, it is less expensive to fix its bugs compared to bugs that may be produced by junior developers when solving the same programming task.” They add that CoPilot averages lower complexity solutions than the students but struggles with understanding certain natural language prompts with the same insight that students readily demonstrate.

They summarize that in generating code (possibly incorrect) human programmers can still beat CoPilot on average. Nevertheless, when paired with an expert programmer who can detect and filter out buggy CoPilot code, the tool can provide real value. However a junior programmer working with CoPilot but lacking that experience would need to be backed up by an experienced reviewer, obviating the value of AI-assisted pair programming,

The collaborative assistant study

This paper describes a study on the effectiveness of LLM agents proactively assisting a developer working on code. This can range from autocompleting a variable or function name to suggesting whole line completion, as seen in Visual Studio Intellisense. The authors built their own editor AI agent to explore a range of options in assistance, to explore developer reactions to different types of help: prompt-only, proactive assistance and proactive moderated through AI presence and context in the development environment/task. (Sidebar, for the IDE they used the open-source Monaco editor that underlies VS Code. This IDE is barnstorming through software and AI embedded development. Take note, EDA developers.)

Under the prompt-only condition the agent helps only when prompted to do something. Proactive assistance (which they call the CodeGhost condition) is agent-initiated assistance. In the moderated model (which they call the CodeEllaborator condition), they indicate agent presence in the code through a caret and cursor where the agent thinks it can help, though actions/suggestions are timed carefully relative to developer state in a task. Assistance is not limited to code change – it can take place in side panels for chat, agent progress on executing a task, or locally-scoped breakout chat windows to discuss topics around other (presumably related) sections of code.

Experiments used a team of CS undergraduates to work on Python-based tasks paired in turn with each of these three assistance options. I will summarize the authors’ conclusions based on both analysis and interviews with the developers.

Prompt-only support was viewed as the least natural and most disruptive method. When compared with proactive options, developers felt the need to stop and build a prompt for each of their requirements was very disruptive and required most effort from them. Conversely proactive intervention required least effort on their part, closer to a true pair partner but was also viewed as disruptive in several cases where the AI took unanticipated actions or disrupted the developer’s flow of thinking, requiring them to switch context and later to have to mentally rebuild their context. This was particularly problematic for the second (CodeGhost) option where lack of obvious AI presence and context could make AI feedback look chaotic.

These findings highlight the importance of human factors analysis in designing such an agent. We must take user psychology and the social aspects of pair programming into account. Is the AI partner behaving collaboratively, avoiding unhelpful interruptions, backing off when the human partner is not appreciating help, but ready to step up again when prompted, while remaining alert to real problems in the human-generated code?

There were multiple positive comments about the value in appropriately timed feedback, but also several concerning comments. One developer felt they were fighting against the AI in some cases. Another said they did not feel emotionally attached to the final code though adding that perhaps this was a learning problem for them rather than a deficiency in the agent. One developer noted “the AI generated code looks very convincing”, raising concern that an inexperienced designer may accept such code without deeper analysis and move on to the next task.

My takeaways

An earlier theory viewed AI assistance as more beneficial to junior programmers than senior programmers. The research reviewed here suggests that view should be reversed, which should be concerning for entry-level programmers at least in the short-term. Either way, AI-based coding is still very much an assistant rather than a replacement for human coders, accelerating their development while still relying on expert review to bring code to final quality. However, with appropriate expectations such assistants can be effective partners in pair programming.

By the way, you should check out Appendix A in the proactive assistance paper for a nice example of prompting both for setup and for actions.

Also Read:

A Big Step Forward to Limit AI Power Demand

Perforce Webinar: Can You Trust GenAI for Your Next Chip Design?

A Principled AI Path to Spec-Driven Verification

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.