For more than a decade, progress in artificial intelligence has been framed almost entirely through the lens of compute. Faster GPUs, denser accelerators, and higher TOPS defined each new generation. But as generative and agentic AI enter their next phase, that framing is no longer sufficient. The most advanced AI systems today are not constrained by arithmetic throughput. They are constrained by memory.

That reality was the central theme of “How Memory Technology Is Powering the Next Era of Compute,” a panel session featuring Rambus participants Steven Woo, John Eble, and Nidish Kamath, moderated by Timothy Messegee. Timothy is Senior Director, Solutions Marketing; Steven is a Fellow and Distinguished Inventor; John is Vice President, Product Marketing for Memory Interface Chips; and Nidish Kamath, is Director, Product Management, Memory Controller IP.

The discussion revealed how modern AI workloads are placing unprecedented demands across the entire memory hierarchy, forcing fundamental changes in system architecture, power delivery, and reliability strategies.
When AI Models Outgrow the Memory Hierarchy
The defining characteristics of today’s AI models include exploding parameter counts, longer context windows, persistent reasoning, and simultaneous multi-user inference. All these characteristics translate directly into dramatically higher memory demands. AI systems now need to move, store, and retain far more data than previous generations of workloads, often for extended periods of time.
At the same time, scaling limits at the lowest levels of the memory hierarchy are becoming increasingly visible. SRAM no longer scales economically or densely enough to keep pace with AI’s appetite for on-chip data. As a result, pressure shifts upward into DRAM, which must now deliver both higher bandwidth and greater capacity. The traditional memory hierarchy, designed for more balanced and predictable workloads, is struggling to adapt to this imbalance.
Architecture Steps In Where Physics Pushes Back
In server environments, the constraints are especially acute. CPUs can only support a limited number of memory channels due to pin count, packaging, and system form-factor limitations. Simply adding more memory channels is not practical, yet AI workloads demand more bandwidth than ever.
This is where architectural innovations such as MRDIMM, or Multiplexed Rank DIMM, become critical. MRDIMM technology uses on-module logic to multiplex parallel memory ranks into a single CPU channel, effectively doubling usable bandwidth without requiring additional pins or channels. Rather than relying solely on faster DRAM devices, MRDIMM demonstrates how intelligent system design can extend performance beyond traditional physical limits.
Telemetry: From Debug Tool to Performance Enabler
Another important shift highlighted during the panel is the growing role of telemetry and observability. In earlier generations, memory subsystems were largely static, configured once and rarely revisited. That approach no longer works in AI systems, where workloads evolve rapidly and performance requirements shift continuously.
Modern memory controllers now provide detailed visibility into internal behavior, enabling real-time tuning and long-term optimization. This level of observability allows systems to adapt as AI models change, sustaining performance and efficiency rather than allowing them to degrade over time. In this context, telemetry is no longer just a debugging aid, it has actually become a core enabler of AI performance.
Reliability Moves to the Center of AI Infrastructure
As AI deployments scale, reliability has emerged as a defining constraint. Memory-related errors already contribute to significant system downtime in large data centers, driving costly overprovisioning to maintain service levels. For AI workloads, where training and inference cycles are both expensive and time-sensitive, an overprovisioning approach is unsustainable.
The panel emphasized that reliability, availability, and serviceability features must now be designed into memory systems from the outset. Advanced error correction, cyclic redundancy checks, retry mechanisms, and fault isolation are becoming essential for sustaining AI uptime. Performance without reliability is no longer acceptable in large-scale AI infrastructure.
Memory Technologies Begin to Converge
One of the most striking themes to emerge from the discussion is the blurring of traditional memory boundaries. Technologies once confined to specific markets are now being reevaluated through the lens of AI workloads.
GDDR7, historically associated with graphics, is increasingly attractive for edge inference. Its use of PAM3 signaling delivers exceptional bandwidth while controlling pin count, and built-in retry mechanisms improve robustness in environments where reliability matters. Meanwhile, LPDDR5X and LPDDR6, long optimized for mobile devices, now offer bandwidth comparable to DDR5 while maintaining superior power efficiency. New modular formats such as LPCAMM2 further extend LPDDR’s reach by combining proximity to the processor with serviceability.
As a result, memory selection is becoming less about market segmentation and more about workload fit.
Power Becomes the Dominant Design Constraint
As AI systems grow denser and more powerful, power delivery has become one of the most difficult challenges facing system architects. Future AI data centers are being designed around megawatt-class racks, driven by high-bandwidth memory, dense accelerators, and massive data movement.
A growing share of total system energy is now consumed not by computation, but by moving data between components. To manage this, architectures are shifting toward higher-voltage, lower-current delivery, with power management integrated directly onto memory modules through PMICs. Even fractional improvements in efficiency can translate into enormous savings at scale.
These power densities also drive changes in cooling strategies. Liquid cooling is rapidly becoming standard in AI systems, reshaping server design and data center infrastructure. Memory, once a relatively passive consideration, is now deeply intertwined with power and thermal architecture.
Chasing the “Best of Both Worlds”
Looking ahead, the panel pointed toward a future in which memory technologies blend strengths that were once considered mutually exclusive. The goal is to combine the bandwidth and power efficiency of mobile-class memory with the reliability, security, and resilience traditionally associated with server-class systems.
This direction opens the door to innovations such as processing-in-memory, inline memory encryption, and new reliability frameworks tailored specifically for AI workloads. As systems evolve toward agentic and autonomous behavior, memory will play a central role in enabling not just performance and scale, but trust, privacy, and long-term stability.
A New Mandate for Memory
The AI revolution has fundamentally changed what is expected from memory systems. Faster alone is no longer sufficient. Memory must now be closer to compute, more efficient, deeply observable, highly reliable, and inherently secure.
AI did not simply expose the limits of the old memory playbook. It is driving its rewrite.
You can watch the entire Rambus panel session here.
Also Read:
Chiplets Reach an Architectural Turning Point at Chiplet Summit 2026
Gate-All-Around (GAA) Technology for Sustainable AI
VSORA Board Chair Sandra Rivera on Solutions for AI Inference and LLM Processing
Share this post via:


Comments
There are no comments yet.
You must register or log in to view/post comments.