WP_Term Object
(
    [term_id] => 13
    [name] => Arm
    [slug] => arm
    [term_group] => 0
    [term_taxonomy_id] => 13
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 397
    [filter] => raw
    [cat_ID] => 13
    [category_count] => 397
    [category_description] => 
    [cat_name] => Arm
    [category_nicename] => arm
    [category_parent] => 178
)
            
Mobile Unleashed Banner SemiWiki
WP_Term Object
(
    [term_id] => 13
    [name] => Arm
    [slug] => arm
    [term_group] => 0
    [term_taxonomy_id] => 13
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 397
    [filter] => raw
    [cat_ID] => 13
    [category_count] => 397
    [category_description] => 
    [cat_name] => Arm
    [category_nicename] => arm
    [category_parent] => 178
)

Arm Lumex Pushes Further into Standalone GenAI on Mobile

Arm Lumex Pushes Further into Standalone GenAI on Mobile
by Bernard Murphy on 09-24-2025 at 6:00 am

When I first heard about GenAI on mobile platforms – from Arm, Qualcomm and others – I confess I was skeptical. Surely there wouldn’t be enough capacity or performance to deliver more than a proof of concept? But Arm, and I’m sure others, have been working hard to demonstrate this is more than a party trick. It doesn’t hurt that foundation models have also been slimming down to a few billion parameters so that now it looks very practical to host meaningful chatbots and even agentic AI on a phone, running standalone on the phone without need for cloud access. Arm have announced their new Lumex platform in support of this trend which may turn me into a believer. What I find striking is that GenAI is hosted on the CPU cluster with no need for GPU or NPU support.

Arm Lumex Pushes Further into Standalone GenAI on Mobile

Why should we care?

The original theory on mobile and AI was that the mobile device would package up a request, ship it to the cloud, the cloud would do the AI heavy lifting and then ship the response back to the mobile device. That theory fell apart for a litany of reasons. Acceptable performance depends on reliable and robust wireless connections, not always certain especially when traveling. Shipping data back and forth introduces potential security risks and certainly privacy concerns. The inherent latency in connections with the cloud makes real-time interaction impractical, undermining many potentially appealing use cases like chatbot apps. Some mobile apps must support quick on-device learning to refine behavior to user preferences. Finally, neither mobile app developers nor their users want to add a cloud subscription on top of their app subscription.

There may still be cases where cloud-based AI will be a useful complement to mobile, but the general mood now leans to optimizing the on-device experience as much as possible.

Arm Lumex, a new generation platform for on-device AI

All good reasons to make AI native on the phone, but how can this be effective? Arm has gone all-in to make the experience real with their newly announced Lumex platform, emphasizing the CPU cluster as the centerpiece of AI acceleration. I’ll come back to that.

Briefly, Lumex introduces new CPU cores (branded C1-Ultra, C1-Premium and C1-Pro) and a GPU core (branded G1-Ultra), with the expected performance advances on a new release, together with a CSS philosophy of complete subsystems extending to chiplets, 3nm-ready, all supported by a software stack and their ecosystem to support fast time to market deployment.

It’s the CPU cores that particularly interest me. Arm is boasting these systems can run meaningful GenAI apps without needing to share the load with the Mali GPU or an NPU. They accomplish this with SME, their scalable matrix extension, now adding a new generation in SME2. This claim is backed up by endorsements from the Android development group, the AI partnerships group at Meta and the client engineering group at AliPay.

Benchmarking shows nearly 5X improvement in latency in speech recognition, nearly 5X encode rate for Gemma (same family as Google Gemini) and nearly 3X faster generation time for Stable Audio (from the same people who brought you image generation).

Why not add further acceleration by folding in GPU and NPUs? Geraint North (Fellow, AI and Developer Platforms at Arm) made some interesting points here. GPU and NPU cores may be faster standalone at handling some aspects of a model, but only for data types and operations within their scope. CPUs on the other hand can handle anything. Another downside to a mixed engine solution is that moving data between engines (e.g. CPU/GPU) incurs overhead no matter how well you optimize, whereas a CPU cluster is already highly optimized for minimal latency.

The final nail in the mixed engine coffin is in aligning with what millions of app developers want. They start their work on CPU, naturally designing and optimizing to that target. Adding in considerations for GPU and NPU accelerator cores is pretty alien to how they think. For maximum business opportunity they also need to support a wide range of phones, some of which may have GPU/NPU cores, and some may not. An implementation based on purely on the CPU cluster keep their plans simple since the CPUs can handle all data types and operations. Kleidi-based libraries simplify development further by making use of SME/SME2 acceleration transparent.

Maybe a highly targeted implementation for one platform could get higher AI performance using the GPU but it wouldn’t be this scalable. Or developer friendly. Lumex offers a simpler development and deployment use-model: GenAI workloads on-device across many phone types without needing to go to the cloud. Very interesting.

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.