WP_Term Object
(
    [term_id] => 13
    [name] => Arm
    [slug] => arm
    [term_group] => 0
    [term_taxonomy_id] => 13
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 392
    [filter] => raw
    [cat_ID] => 13
    [category_count] => 392
    [category_description] => 
    [cat_name] => Arm
    [category_nicename] => arm
    [category_parent] => 178
)
            
Mobile Unleashed Banner SemiWiki
WP_Term Object
(
    [term_id] => 13
    [name] => Arm
    [slug] => arm
    [term_group] => 0
    [term_taxonomy_id] => 13
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 392
    [filter] => raw
    [cat_ID] => 13
    [category_count] => 392
    [category_description] => 
    [cat_name] => Arm
    [category_nicename] => arm
    [category_parent] => 178
)

How to Develop Accurate Yet High Performance Models

How to Develop Accurate Yet High Performance Models
by Pawan Fangaria on 01-13-2014 at 12:00 pm

In today’s environment of semiconductor design, SoCs are crammed with various IPs with multiple functionalities and processors integrated together. In such an event it has become necessary to model the system and verify on Virtual Platform before getting into actual design and fabrication. And that requires modelling of each block at the required level of abstraction. Even to re-use an IP or existing design block needs its modelling in the context of the new design in which it is to be used.

Ideally, it may be desired that a model should be fast enough (as software run) at the Programmers View (PV) level modelled at LT (Loosely Timed) or UT (Untimed) level of abstraction. However, as we move towards actual hardware of the system, timing accuracy sets in, ultimately leading to CA (Cycle Accurate)level of abstraction which decreases the performance by orders of magnitude compared to that of programmers view. In practical situations we need both types of models depending on the accuracy level required for particular blocks. AT (Approximately Timed) models are less prevalent because they are neither 100% accurate nor as fast as LT models and require considerable cost of development and maintenance. In comparison, LT models can be easily developed by mapping functionality into software and CA models can be easily translated from RTL implementation.

Now the real question is how to get the best out of both ends of spectrum, LT and CA? CA models will slow down LT models, hence limiting the overall speed of the Virtual Prototype. But there is a way out; I am delighted to see this novel approach developed by Carbonand ARM where they exploit the accuracy of CA and speed of PV models and enable them to complement each other as required by the system in Virtual Prototype.


[PV and CA Integrated Platform]

In the above arrangement, it’s very convenient for a designer to execute the system in LT mode up to a point (such as booting of an OS) and then change to CA mode for tasks which require more accuracy. This requires each model to have some check point (CP) facility which can be utilized to do the swap between LT and CA mode of execution. ARM Fast Model system provides Cycle Accurate Debug Interface (CADI) and ARM ESL APIs which can be used to create such CPs. Any type of model can use ARM ESL APIs to make this kind of swapping possible at the CP. Since there are differences in execution of LT and CA models, testing of the swap functionality can only be done by creating multiple random CPs and continuing the program execution from these CPs until the program completion and looking at the end result.


[Partitioning the platform between LT and CA for speed and accuracy]

Above is an example of the PV and CA integrated platform in which ARM Mali[SUP]TM[/SUP] GPUwas Carbonized (by using Carbon Model Studio) and linked together an ARM Fast Model representation of the system. Variations of this exact setup have been deployed at multiple semiconductor design houses. The processor/memory subsystem is sufficient to boot the Linux OS and get to a prompt within 15-20 seconds irrespective of GPU being present or not. The speed goes down when CA model becomes active in processing graphics frames, each frame taking approximately 90 seconds. On the other hand, the hardware prototype, although was much quicker in frame processing, took about 15 minutes to boot Linux, i.e. by the time Linux was booted, the Virtual Prototype had already processed about 10 frames.


[Applying check points (CPs) in swap enabled platform]

A swap-enabled LT system runs like any other Virtual Prototype, however it can be changed to 100% CA at any point of interest. Typically software breakpoints are chosen, such as start of various driver codes inside the OS kernel as shown in the above picture. A single Fast Model run can create multiple CPs, which can then be simulated and debugged (with detailed hardware and software interactions in 100% accurate environment) independently in parallel by different personnel. The results from these runs can be used to analyse performance, power etc.

Swap capability from ARM Fast Models to Carbonized ARM models is in existence and active use at numerous design companies. The functionality is readily available for ARM Cortex-A15, Cortex-A9 and Cortex-A7 processors along with their peripheral models.

It’s a great innovative approach to optimize virtual prototyping with a single virtual prototype debugging software at fast speed and at the same time having capability to execute at 100% accuracy, as required for architectural exploration, firmware development and system debug. Bill Neifert, CTO, Founder at Carbon Design Systems and Rob Kaye, Technical Specialist at ARM has described this process in great detail along with some more future work in their whitepaper posted at Carbon website. It’s a great read for system designers and IP developers.

More Articles by Pawan Fangaria…..

lang: en_US

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.