Optimize Your Interconnect & Design at System Level for Best Results

Optimize Your Interconnect & Design at System Level for Best Results
by Pawan Fangaria on 09-16-2014 at 7:00 am

As the SoC design size, complexity and functionality keeps on increasing with multiple IPs packed together and design time and time-to-market keeps on decreasing amid critical constraints on PPA, there is no other alternative than to do the design first-time-right not to miss the window of opportunity. And that could be possible only when major decisions are taken at the system level with a working prototype optimized after couple of iterations, which can then be refined through design flow to get the actual silicon. At the system level, Interconnect IP plays a major role in optimizing the overall design and hence it has to be versatile to be used in optimum configuration taking into consideration various aspects such as traffic in the system, memory access bottlenecks, CPU latency limits, routing congestion and so on.

It was extremely inspiring attending a webinar, offered jointly by ARMand Carbon Design Systems, on “Pre-silicon Optimization of System Designs using the ARM CoreLink NIC-400 Interconnect”.

What an impressive network interconnect, it can be configured in most optimal way to achieve target performance while keeping power and area within budget. Multiple masters can be dynamically scheduled, according to priority, to access minimum costly shared resources such as DDR bandwidth with proper data traffic management without creating any routing congestion, thus receiving best QoS (Quality of Service) at least cost. The AMBA Designer provides GUI for easily configuring the interconnect that minimizes CPU latency and sets multiple clock domains to gain best performance and power saving. Fast timing closure can be achieved through registering options and configuring Thin Links between different NIC-400 switches requiring less wiring.

Often high bandwidth IPs can flood the system with traffic causing congestion that can block the requirements from Real-time masters such as LCD (in the above picture) and connections from low latency components such as CPUs, thus rendering priority masters starve from accessing memory and causing system performance degradation. This kind of congestion is controlled by dynamic QoS regulation where real-time masters and low latency connections get priority according to their QoS value. Of course, a high bandwidth IP can raise priority if starved. CoreLink QVN-400 goes a step further by providing arbitrated Virtual Networks for different kinds of traffic for their assured paths to memory controller.

That’s about the interconnect optimization, now how to make pre-silicon assessment for the optimality of the overall design and ensure that the same level of PPA remains down the design stream? That’s where Carbon’s Virtual Prototyping platform along with its excellent tools for accurate system prototyping comes into picture. After setting up the interconnect, important system components such as CPUs, GPUs, memory controllers and internal IPs are added and the system is exercised with bare metal software. The system level performance is optimized through booting the OS.

To get 100% accurate results, 100% accurate models for the entire system are needed. Carbon provides IP Exchange Web Portal where 100% accurate model of an IP can be compiled automatically from its corresponding RTL. As an example, CoreLink AMBA Designer IP-XACT file can be uploaded to the web portal, 100% accurate model created from ARM RTL and link to down the accurate model provided to the user via e-mail. Accurate virtual models from leading IP providers including ARM, Arteris, Cadence, CEVA, Imagination, Mentor, Netspeed, and others can be made available from the portal.

For driving the system, analyzing traffic and optimizing; traffic between producers and consumers are parameterized to mimic the modeled component. The system can be iterated over with different configurations, simulations and analyses with parameterized as well as Vector Playback (more realistic, obtained from components) and Programmed Traffic to target real functions. Also, any generic component can be replaced with its real model and the system re-tested for further optimization with the true model.

Carbon ModelStudio can be used to carbonize any model to link it with another model at different level of transaction, for example a CA model of ARM Mali GPU can be carbonized and put together with any other ARM Fast Model. A system can be constituted with various kinds of models; carbonized, accurate model obtained from IP Exchange portal, existing model or even model in SystemC or C++.

Carbon’s SoC Designer Plus virtual platform provides a remarkable capability, ‘Swap & Play’ where the system running in LT model can be changed to 100% CA at any point as desired, thus allowing cycle accuracy without actually booting Linux in CA model. After quickly booting Linux or Android, multiple checkpoints (CPs) can be created for firmware/driver engineers who can then ‘independently’ debug and validate their code against an accurate system. At OS level, system benchmarks can execute for many billions of cycles. Execution at multiple checkpoints can proceed in parallel to provide results in hours rather than days, thus enabling fast and accurate performance analysis.

Carbon offers a large variety of pre-built virtual prototypes which they call CPAKs (Carbon Performance Analysis Kits). They have over 100 of these today featuring advanced ARM IP, include 12 that feature ARM’s NIC-400. A CPAK provides pre-built extensible virtual prototype with reconfigurable memory and fabric, and pre-build bare-metal s/w and OS ports. Carbon provides ‘Swap & Play’ enabled CPAKs along with source code for all s/w components, downloadable any time from Carbon’s web portal.

It was a great learning session presented at length by William Orme from ARM and Bill Neifert from Carbon, followed by an interesting demo by Eric Sondhi from Carbon, which included multiple real world case studies to show how system is brought up fast and system level performance analysis with 100% accurate models done reasonably fast. There was an involved Q/A session at the end. Check out the on-line webinar including demo here to get the actual detailed insight.

Also read –

https://www.legacy.semiwiki.com/forum/content/3072-how-develop-accurate-yet-high-performance-models.html

https://www.legacy.semiwiki.com/forum/content/3000-taming-interconnect-real-world-socs.html

More Articles by Pawan Fangaria…..