A switch this month to principles behind building effective agentic systems, going beyond simply a new way to stitch together tools, agents and orchestration, to deeper consideration of user experience and how we most effectively blend agentic with human-in-the-loop. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and lecturer at Stanford, EE292A) and I continue our series on research ideas. As always, feedback welcome.

The Innovation
This month’s pick is Magentic-UI- Towards Human-in-the-loop Agentic Systems. The authors are from Microsoft Research. The paper was published in arXiv in 2025 and has 19 citations.
How can we ensure that agentic system assemblies don’t compound uncertainty and complexity in operation? This paper from Microsoft Research describes a web-based platform (open-source under GitHub) for researching how to optimize user experience (UX) and confidence in control by systematizing collaboration between agents and human-in-the-loop, a very relevant topic these days.
This isn’t a deeply technical paper, but it does offer plenty of interesting ideas on co-planning, co-tasking, action guards, and learning from task execution.
Paul’s view
Intriguing paper this month out of Microsoft Research, exploring how to keep a human “in the loop” during long running complex agentic AI tasks. With 2026 shaping up to be a big year for applying agentic AI to RTL design and verification, how best to keep DV engineers and RTL designers in the loop is a hotly debated topic with our customers.
The paper summarizes various methods to keep a human in the loop based on the underlying workflow-based architecture on which modern agents approach complex tasks. For example, an agent usually begins by asking the LLM to break the task down into a multi-step plan consisting of smaller more manageable sub-tasks which are then swarmed out to sub-agents to complete. A quick summary of the methods proposed in the paper is as follows:
- Co-planning: check back in with the user after the initial multi-step plan is generated.
- Co-tasking, part 1: have agents continuously communicate what they are doing to a live stream that the user can watch and intercept if an agent is going off the rails.
- Co-tasking, part 2: Provide instructions in prompts that guide agents to seek user clarification or confirmation in certain situations.
- Memory: whenever an agentic session is successful at a task, the user can save that session to a side file by running an agent to review and summarize its live stream traces into a prompt that can guide a future agentic session. What we now call a “skill”. The live stream traces also include all the human interventions, so the saved skill can include instructions on when to prompt the user.
The authors implement their methods in a system called Magentic-UI and benchmark it on a well-known agentic AI benchmark called GAIA by creating a special agent that operates as a surrogate for a real human. This “simulated user” is given a cheat sheet of golden human created reference plans for how to complete each of the tasks in the benchmark. Magentic-UI achieves a 30% score when it’s told never to prompt the user, and a 50% score when allowed to prompt the simulated user. A human performing the tasks entirely on their own with a web browser scores 90%. Hard to know if the 50% to 90% gap is due to limitations in the simulated user agent or the human prompting methods themselves, but either way it’s a big gap, so plenty of room for further innovations here!
Raúl’s view
This month’s paper is about Magentic-UI: Towards Human-in-the-loop Agentic Systems, an open-source prototype interface from Microsoft Research for studying human-in-the-loop agentic systems. The premise is straightforward: today’s agents are not reliable enough to operate autonomously, so productivity comes from combining agent execution with human oversight.
Magentic-UI is built as a multi-agent architecture that explicitly treats the human as part of the agent team. Its main contribution is a set of six interaction mechanisms:
- Co-planning (joint human–agent plan creation)
- Co-tasking (shared execution)
- Action guards (human approval of risky actions)
- Answer verification (post hoc validation of results)
- Memory (reuse of prior plans)
- Multi-tasking (parallel agent execution with human oversight)
Evaluations include benchmarks (GAIA, WebVoyager, etc.), simulated users, and a small qualitative study. The authors conclude that these mechanisms “have the potential to improve task success and reduce oversight burden.”
Two issues stand out.
First, the evaluation is weak. A ~10–12 person, one-hour user study is not statistically meaningful, and simulated users (LLMs) are a poor proxy for real human behavior. The authors themselves position the study as qualitative.
Second, the idea of a standard interface for agentic AI is questionable. History suggests otherwise: different ecosystems optimize for different interaction models. Google tends toward minimalist, search-centric interfaces (one box, increasingly agentic underneath), Microsoft favors feature-rich, layered interfaces (Office, now Copilot everywhere). Agentic systems will likely fragment by context, for consumers, largely invisible automation; for enterprises, audit-heavy workflows; for developers, programmable pipelines.
The system also remains far from human-level performance (roughly 30–50% task success vs. ~90% for humans on some benchmarks), and only 41.7% of users in the study said they would use it frequently. This reinforces the paper’s premise but also highlights that the interface does not solve the core capability gap.
Despite these limitations, the paper is worth reading. It clearly defines human–agent interaction patterns (co-planning, co-tasking, etc.), tightly integrates agents with UI, introduces practical safety ideas like action guards, and argues for a plan-centric interface that improves transparency and control. Most importantly, it shows how humans can collaborate with imperfect agents, rather than assuming near-term full autonomy.
Magentic-UI sits within a broader movement toward agentic interfaces. Google appears to be pushing toward invisible, search-centric agents, while Microsoft is embedding agents into rich, Office-like workflows. My view is that agent interfaces will fragment across use cases rather than converge.
Share this post via:

Comments
There are no comments yet.
You must register or log in to view/post comments.