I am on a voyage of discovery through prompting and prompting technologies because these are the critical interfaces between what we want (or roughly imagine we want) from AI, and AI’s ability to deliver. I have seen suggestions that any deficiencies today are a detail that will soon be overcome. I’m not so sure. Yes, prompting technology will continue to advance but there are hurdles along the way which may require rethinking how we humans interact with AI. For this blog I have drawn on a widely cited paper which studied how people who are not AI experts interact with a recipe assistant chatbot to optimize its behavior for maximum user friendliness.
First, what already works well
Before I get to problems, some GenAI applications are already easy to use (once built). Drafting a reply to an email is an example. You start with a prompt, “Draft a friendly reply and make the following points in response…” The chatbot generates a draft which you can then edit as needed before sending. Refining your prompt might be a nice-to-have but is not essential since you have final control over the content, especially important to be able to correct any glaring errors in the draft.
An advance over a simple prompt would be an “appified” prompt. An end user is offered a menu of pre-determined options, possibly allowing for some level of parameterization. Scaling this kind of app to a wide user base with no expertise in AI should not be challenging. On the other hand, developing the app requires significant prompt engineering expertise and a well-characterized understanding of expected use models. To develop and support such apps you no longer need an expert software engineer. Now you need an expert prompt engineer!
Beyond these and similar use cases, expected use models start to look more like regular prompt engineering chatbots, designed to support non-AI experts within some domain. An example might be a help chatbot to get advice on how to use a simulator feature, or help generate a complex assertion, or to generate a script to accomplish some task. This is where the study I mentioned above becomes interesting.
The study experiment
GenAI systems have already advanced beyond simple one-time prompts to allow for multiple rounds of edit in refining an initial attempt. This is commonly called multi-turn prompting. Some systems even switch roles, prompting you to provide answers to a series of questions, in expectation that answering simple questions may reduce opportunities for ambiguity. The study considered here somewhat combines these two approaches.
The study is based on non-experts working with a recipe development chatbot to refine how it communicates to an end-user-friendly bot on how to prepare and cook veggie tempura, supported by an example of a real TV show in which a chef guides would-be cooks in preparing the same dish. Participants ranged from academics to professional programmers, all with STEM backgrounds, and none had meaningful experience with AI of any kind.
While this example may not seem very relevant to most chatbot applications, what caught my attention is not the detail of the application but rather the prompt engineering experiences of these participants and the problems they ran into along the way, most of which are very relevant (I think) to almost any kind of human-driven prompt engineering.
Findings
The important insights here are much more around our human limitations than in limitations in chat technologies. First, the study finds that participants almost exclusively used an opportunistic, ad hoc approach to prompt exploration.
You might ask, “As opposed to what?” Professionals in almost any domain are comfortable with the need to systematically analyze options to choose a best next step, but this is not how most of us approach prompting. We expect to converge quickly on what we intuitively want without need for disciplined structure or semantics in our prompt or prompt refinements.
In a similar vein, participants were prone to over-generalize from success/failure on individual prompt changes and clearly modeled their interaction on human-to-human exchanges, not appreciating differences in how AI processes feedback.
In over-generalization, they generally aimed to find a desirable behavior in one or two attempts. If that worked it was good enough for them, if not they assumed what they asked for was beyond the capabilities of the AI. I can relate. If I am using prompt refinement to get to a goal and I get close, why would I push further? If I don’t get close as a non-expert I have neither skill, nor time nor interest in systematically exploring how different changes to the prompt will affect the outcome.
The assumption that human-to-AI dialog mirrors human-to-human dialog leads to some interesting disconnects. One example is in participants preferring to use direct instruction (do this… ) rather than examples, even when examples are available (from the TV show). This is a common issue in storytelling where we should “show rather than tell”. Show by example rather than by direct instruction. This seems counterintuitive to us, but bots agree. They respond more effectively to examples than to direct instructions. If we’re honest, even we humans would agree. Direct instruction works well in textbooks, not always so well in conversation.
In a similar vein, participants would often use negatives to try to direct behavior. They were surprised to find these directions were often ignored (LLMs struggle with negatives). They were encouraged by study leaders to repeat themselves multiple times to reinforce a direction but apparently felt this was unnatural, even when they were shown it could be effective. Another area where human-human dialog and human-AI dialog diverge.
Conclusions
If these findings are self-evident to you, congratulations. You are one of a small and elite group of prompt engineering experts. But that’s not very helpful to the large market of non AI-expert users in business, engineering and other domains (in which I include myself). The behaviors we must learn to effectively prompt chatbots are not at all intuitive to us. Perhaps we should think of chatbots as very experienced children. They know a lot about the domain that interests us and can provide very useful answers to our questions, but we need to coax those answers from them through conversational gambits, rather different from the approach we would use in talking with an experienced adult.
Or maybe there is a way chatbots can treat us as very experienced children (!), guiding us through a series of simple prompts to what we really want.
By the way, shout out to David Zhi LuoZhang (CEO of Bronco AI) for pointing me to Gemini Nano Banana, my new favorite for AI-based image generation!
Share this post via:
Revolutionizing Processor Design: Intel’s Software Defined Super Cores