Consensus on smartphone markets hovers somewhere between slight decline and slight growth indicating lack of obvious drivers for more robust growth. As a business opportunity this unappealing state is somewhat offset by sheer volume ($500B in 2023 according to one source) but we’re already close to peak adoption outside of China so the real question for phone makers must be “what is the next killer app that could move the needle?”
We consumers are a fickle lot and entertainment seems to rank high on our list of must-haves. Arm is betting on mobile gaming. Another possibility might be generative AI for image creation/manipulation. Qualcomm has already demonstrated a phone based capability while others including Apple are still focused on large language model apps. For me it’s worth looking closer at the image aspect of generative AI simply to be a little more knowledgeable if and when this takes off. For fun I generated the image here using Image Creator from Microsoft Bing.
I am going to attempt to explain the concept by comparing with an LLM. LLMs train on text sequences, necessarily linear. Lots of it. And they work on tokenized text, learning when they see a certain sequence of tokens what might commonly follow that sequence. Great for text but not images which are 2D and generally not tokenizable, so the training approach must be different. In diffusion-based training, first noise is progressively added to training images (forward diffusion), while the network is trained by denoising modified images images to recover each original image (reverse diffusion). Sounds messy but apparently the denoising method (solving stochastic differential equations) is well-defined and robust. The Stable Diffusion model, as one example, is publicly available.
It is then possible to generate new images from this trained network, starting from a random noise image. Now you need a method to guide what image you want to generate. Dall.E-2, Midjourney, and Stable Diffusion can all take text prompts. These depend on training taken from text labels provided along with the training images. Inference then includes prompt information in the attention process in the path to inferring a final image. Like LLMs these systems also use transformers which means that support for this capability requires new hardware.
Generation is not limited to creating images from scratch. A technique called inpainting can be used to improve or replace portions of an image. Think of this as an AI-based version of the image editing already popular on smartphones. Not just basic color, light balance, cropping out photobombs, etc but fixing much more challenging problems or redrafting yourself in cosplay outfits – anything. Now that I can see being very popular.
Will generative AI move the needle?
I have no idea – see above comment on fickle consumers. Then again, visual stimulus, especially around ourselves, and play appeals to almost everyone. If you can do this on your phone, why not? AI is a fast-moving domain which seems to encourage big bets. I certainly wouldn’t want to bet against this possibility.
I should also mention that generative imaging already has more serious applications, especially in the medical field where it can be used to repair a noisy CAT scan or recover details potentially blocked by bone structure. I can even imagine this technology working its way into the forensics toolkit. We’ve all seen the TV shows – Abby or Angela fill in missing details in a photograph by extrapolating with trained data from what is visible. Generative imaging could make that possible!Share this post via: