
At the 2025 RISC-V Summit, amid debates over cloud scaling and AI cost, DeepComputing CEO Yuning Liang offered a radical view: the future of intelligence isn’t in the cloud at all — it’s already in your pocket. His lunchtime conversation began with iPhones and ended with the death of the operating system. In between, he sketched a vision of computing that dissolves centralized infrastructure into personal devices, reshaping everything from hardware to human interaction.
Why Do You Need the Cloud?
Yuning waves his phone and laughs. “People keep saying, ‘We need the cloud for AI.’ Why? You’ve already got the compute right here. The only thing missing is space to store your models.” The cloud is no longer the default; it’s just a backup. In Yuning’s framing, bigger storage tiers aren’t luxury upgrades — they’re AI capacity ratings.
He taps on his iPhone’s storage line: “See? 64 gigabytes isn’t just storage — it’s intelligence. You want Whisper for voice, a 7B LLM for reasoning, a vision model for your camera? Fine. They all live here.” Thanks to aggressive model quantization — including FP4 and FP8 formats — today’s smartphones can host multiple intelligent agents. Storage becomes the limit on your on-device intelligence.
William Gibson imagined this in 1988. In Mona Lisa Overdrive, Kumiko carried a pocket-sized Maas-Neotek biochip that projected an AI companion named Colin — visible only to her, contextually aware, able to access local systems and provide guidance. Thirty-seven years later, Yuning is describing essentially the same device. Gibson conjured it decades before we had the silicon to build it.
Today, Yuning argues, correctness is perceptual, not mathematical. If your LLM gets one token wrong out of a million, no one notices. He shakes his head. “In the 20th century, if Intel was off by 0.001 in floating point, it was a crisis. Divide-by-zero? Immediate trap, big exception. That was engineering pride.”
“In RISC-V, you don’t even need to take the divide-by-zero exception,” he says. “Just flag it. Keep running. Why waste cycles pretending to be perfect when the output is statistical anyway?” This mindset shift explains the rise of ultra-low-precision formats like FP4. What matters isn’t IEEE-754 perfection — it’s the human experience.
“The user interface is a relic,” Yuning says. “You won’t open apps in ten years. You’ll forget what a mouse was. You’ll just tell your glasses what you want.” He describes a lightweight AI OS — not an operating system in the old sense, but a runtime that manages models, sensors, and context. One orchestrator that knows who you are, what you’re doing, and loads the right model at the right moment. No menus. No icons. Just interaction.
When the topic turns to silicon, Yuning grows animated. “Everyone’s building SoCs like they’re building cities — giant, expensive, unmanageable,” he says. “I just want to build the brain.” To Yuning, the ideal AI processor is scalar, vector, and matrix compute wrapped around fast SRAM — maybe GDDR6. Everything else (I/O, radios, sensors) is the body. Let the OEMs build the hands and eyes. His job is to supply the cortex.
“The chiplet model fits perfectly,” he says. “Each part is an organ. You snap them together.” This modular vision aligns with the Open Chiplet Architecture, which promotes interoperable compute, memory, and I/O units. Yuning never names OCA directly, but the alignment is unmistakable. “Just make sure the interfaces fit.”
“Everyone’s obsessed with 1024-bit vectors and out-of-order cores,” Yuning says, voice rising an octave. “They’re expensive, they’re hot, and they don’t scale. Why do that?”
Instead, he champions deterministic simplicity. “If you can get the same user-perceived performance with half the power and half the cost, you win. Don’t chase benchmark vanity. Build something that feels fast.”
Here, he echoes a rising sentiment within the RISC-V community — seen in startups like Simplex Micro — that favors deterministic scheduling over speculative execution. Elegance, Yuning suggests, is the new performance metric.
How can a startup survive against Nvidia and Apple? “Do everything with ten times less — people, money, time,” he says. “That’s the rule.” Big companies, he argues, have thousands of engineers managing thousands more. Startups survive by constraint and speed. “We can do in a week what they do in a quarter — if we stay hungry.”
And forget the cloud. “Cloud is not for startups,” Yuning says. “It’s complex and slow. Don’t envy Cerebras or Groq — unless you want to raise money and become another big corporate like OpenAI or Anthropic.” He laughs. “Once you have hundreds of people, thousands of people? You lose the speed. You get all the bad odds — wasting money, wasting time, wasting opportunities. Like government.”
The key isn’t to outspend giants. It’s to out-focus them.
Yuning’s closing prediction: “You’ll wear your computer,” he says. “Glasses, earbuds, maybe a pendant. They’ll talk to each other through local models. No screen, no keyboard. The model is the interface.” The core, he insists, will be a small deterministic brain — not a GPU farm.
The Evaporating Cloud
“Fast, private, local. That’s the future. The cloud becomes the teacher; your device becomes the student. Once it learns, the cloud becomes an artifact.” When that happens, Yuning says, computing will finally be personal again.
Also Read:
Two Open RISC-V Projects Chart Divergent Paths to High Performance
The Foundry Model Is Morphing — Again
The AI PC: A New Category Poised to Reignite the PC Market
Share this post via:



Comments
There are no comments yet.
You must register or log in to view/post comments.