In optimizing SoC design for performance, there is so much focus on how fast a CPU core is, or a GPU core, or peripherals, or even the efficiency of the chip-level interconnect. Most designers also understand selecting high performance memory at a cost sweet spot, and optimizing physical layout to clock it as fast as possible within power consumption limits, is imperative.
One can do all of that exactly right, and still have a lousy performing, and perhaps overdesigned, SoC. But, it doesn’t have to end up that way.
Dealing with the nuances of DDR memory controllers and comprehending what actual traffic patterns are in play can make a huge swing in performance. For instance, just getting address mapping right – conversion of AXI addresses to physical memory addresses, matching what the application is really doing – can improve memory subsystem performance by 20% or more. Optimizing clock frequency allows better use of bandwidth at lower speed bins, which can reduce cost and power.
It’s the last point raised by Synopsys’ Patrick Sheridan in opening a recent webinar that got my attention: QoS. “Different [DDR memory] masters can have varying and often contradicting requirements.” There is high priority traffic, and so-called low priority traffic, and both can starve affecting overall system performance. Optimizing a DDR controller isn’t as simple as throwing one switch; a blend of parameters needs to be explored.
Synopsys is in a unique position to provide a perspective on this topic. They provide IP, in this case a DesignWare DDR uMCTL2 memory controller block. They also provide tools for optimizing IP in SoC designs, such as Platform Architect MCO with multicore optimization technology. The environment described is a SystemC simulation with appropriate IP models to provide DDR subsystem visibility.
Combining in-depth understanding of DDR memory controller IP via models with workload simulation capability delivers what Synopsys claims is at least a 10x improvement over trying to fight it out with just RTL-level techniques. HDL co-simulation of RTL IP is fully supported. However, I think once viewers see this event, they may re-evaluate their current approach.
One thing I did not appreciate fully before viewing this webinar was just how many parameters are involved in designing around a DDR memory controller. The webinar moves on to take a very detailed look at analyzing the uMCTL2 IP in a mobile SoC application, presented by Tim Kogel.
The use case analysis Kogel presents looks at a mix of traffic from a CPU, a GPU, a camera, and a display in a mobile device. The scenario models 300 us of traffic, with a QoS goal of 200 uS for the graphics processing. Illustrated is an approach to define elastic workloads across the IP blocks synchronized as necessary, then all projected onto a deadline analysis.
Address mapping is explored and optimized using the performance model, using a graphical view of JEDEC commands per interval. “Hot bit” visualization aids exploration, and then the memory clock speeds are optimized – again, using the actual traffic load and the deadline constraints.
That’s just the start of the event. Kogel then goes into a detailed discussion of parameter configuration, including a video showing how Platform Architect MCO can optimize hundreds of parameters in the uMCTL2. A key takeway: 300 us of real-time traffic is simulated, with all instrumentation and graphical visualization enabled, in about 10 seconds. This makes it super easy to change a parameter and re-simulate almost instantly.
To register and view the complete event:
This is a great example of how powerful SystemC modeling can get inside IP quickly and explore complex issues in real-world scenarios. Even if you are not using Synopsys IP, Platform Architect, or SystemC modeling, this is worth your time to see the approach. What you may be overlooking, or spending huge amounts of time solving, could make the difference in your next design.