Keeping up with the furious pace of AI innovation probably doesn’t allow a lot of time for deep analysis across many use cases. However I can’t help feeling we’re sacrificing quality and ultimately end user acceptance of AI by prioritizing new capabilities over rigorous productization. I am certain that product companies do rigorous testing and quality control but modeling the human element in AI interaction, through prompts or voice commands for example, seems still to be more art than science, an area where we still don’t understand safe boundaries between how we communicate with AI and how AI responds. To illustrate I’m sharing a couple of my own experiences, one in image generation and one in voice control for music in my car.
Image generation
I regularly use AI to generate images for my blogs and more recently for my website because I can generate exactly what I want and fine tune as needed through refined prompts. At least that’s my expectation as an average non-expert user. Building a trial website image revealed a gap between expectation and practice, as should be clear in the image above.
I didn’t set out to create an image of a guy with three hands. My prompt was something like “show an excited storyteller standing on a pile of ideas”. A little abstract but nothing odd about the concept. The first image the generator built was OK but not quite what I wanted so I added a couple of modest prompt tweaks. The image I’m showing came from the second tweak. The generator hallucinated and I have no idea why, nor did I know how to correct this monster.
I switched to a different starting prompt and then to a different image generator (I tried GPT4-o and DALL-E 3) but was still shooting in the dark. Without any understanding of safe boundaries there was no guarantee I wouldn’t run into a different hallucination. The obvious concern here is how this affected my productivity. My goal was to quickly generate a decent image conveying my concept, then move on with the rest of my website building task. Instead I spent the best part of a day experimenting blindly with the image generation tool.
Voice control for Apple Music in my new car
I write frequently about automotive tech so it seemed only right that on a recent car purchase I should go for a model with all the latest automation options, allowing me to comment from experience, not just from theory. I’m not going to name the car model, but this is a premium brand, known for excellent quality.
There are many AI features in the car including advanced driver assistance which I haven’t yet started to explore (maybe in later blogs). Here I just want to talk about controlling music selection through Car Play using voice control, an important safety feature to minimize driver distraction. The alternative control surface is an inviting center console touch screen which is not where I should be looking when I’m driving; I know because I drifted partly out of lane a couple of times driving back from the dealer. Not going to make that mistake again.
Now I know I should only be using voice controls when driving. But when my voice commands aren’t working it is natural to look at the screen to try to figure out the problem. That was happening to me a lot when driving back from the dealer. I eventually learned what I was doing wrong, which pointed to more opportunities for improved productization.
First a few of the tracks stored on my phone are corrupt – no idea why. When Apple Music hit a corrupt track it stopped playing. I initially guessed something about the app was broken when running through Car Play. Second, and more importantly, I didn’t know what voice commands I could reasonably use with Siri. An illusion encouraged by chat and voice control engines is that they can understand anything. In fact they do a very good job of mimicking understanding within a bounded range but they don’t always make it apparent where they are crossing from correct interpretation of our intent to guesses or to simply saying they don’t understand or ignoring me.
As an infrequent Siri user, the trick I had to learn was to first to use voice control exclusively when driving, and second to ask Siri what commands she can understand. I also learned that Apple Music might get stuck on a (presumably corrupt) track but wouldn’t tell me why it had stopped. Now I know if I give a command and don’t hear anything I have to suggest it skip a track.
On the plus side as a Siri novice, I learned that Siri knows a lot about music genres, so I don’t have to limit a request to a specific song or album. I can ask for baroque music or rock and it will do something intelligent with that request. Pretty cool.
Takeaways
What does any of this have to do with the larger world of AI applications? The systems used in these two examples are built on AI platforms. The image generation (diffusion) part is different but interpreting the prompt is regular LLM technology. Voice-based commands also build on LLM except the input is audio rather than text. So my experiences in these simple examples could be indicative of what we might encounter in many AI applications.
You might not consider what I found to be major issues, though for me creating confusion while driving is a potentially big problem. My major takeaway here was that I had to learn more about Siri capabilities before I could use Car Play effectively while still being a safe driver. Not quite the trouble-free experience I had expected.
When generating images I didn’t appreciate the productivity hit required by trial-and-error exploration through prompts. I will now be a lot more cautious in using image generation and I will be more cynical about products which claim to simplify generating exactly the image I want.
What could product builders like my auto manufacturer, smartphone builders, and image generation products do to help more? My suggestions would be more guard bands to separate accurate interpretation from guesses, more methods to validate/double-check interpretation, and more active feedback when something goes wrong. Improved productization on top of already impressive AI capabilities could limit negative experiences (and accidents) and help AI transition from neat concepts to full production acceptance.
One more important area where product builders might contribute is to help us refine our own understanding of what AI can and cannot do. Unfortunately too many users still see AI as magical, thinking it can do whatever they want, maybe even better than they can imagine. We need to have drilled into us that AI is just a technology like any other technology, very capable within its own limitations, and that we must invest in understanding how to operate it correctly. Then we will avoid disappointment, or worse dismissing everything AI as hype, when instead the real problem is in our over-inflated expectations.
Also Read:
Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
CEO Interview with Carlos Pardo of KD
Arm Reveals Zena Automotive Compute Subsystem
Share this post via:
Rapidus, IBM, and the Billion-Dollar Silicon Sovereignty Bet