Multiple processor cores are now a given in SoCs. Grabbing IP blocks and laying them in a multicore design may be the easy part. While verification is extremely important, it is only the start – obtaining real-world performance depends on the combination of multicore hardware and actual application software. What should engineers look for in evaluating a multicore design?
A new white paper series from Mentor Embedded provides a perspective on this question. Compared to the single core, or more accurately core-centric approach, author Manfred Kreutzer suggests a multicore approach must address three issues for success.
The first is obvious: software must capitalize on concurrency and parallelism. Some scenarios may have the luxury of having roughly the same number of threads and cores, but in most cases, threads will likely outnumber cores. If an overwhelming number of tasks vastly outnumbers cores, this kicks off thread migration, with a high degree of time-slicing and likely significant numbers of cache misses. Visualizing what threads are on what core provides clues into how well tasks are partitioned and assigned.
Resource utilization is the second issue. Even in simpler cases, threads are often non-symmetric; this results in some cores being near fully loaded, and some cores less loaded. There can be conflicts when I/O or interprocess communication enter the picture, throwing off an otherwise efficient thread of execution. This also hits at decisions such as core scaling; for instance, are eight cores necessarily better than four? The answer may not be so simple if more cores wait around more often, or fewer cores churn constantly at full power. Core asymmetry, such as ARM big.LITTLE or using GPU or DSP cores to accelerate tasks, may also be a consideration.
That suggests a third issue, which is a rude awakening for many designers: power consumption is highly software dependent. This is a result of a combination of factors, starting with core partitioning and DVFS, leading into caching and memory allocation, and to system issues such as waiting for I/O resources. In short, implementations must be power-aware – in both hardware and software. One of the capabilities needed is to trace thread execution and power consumption together, showing a correlation. Just as earlier generations of software tracing focused on source code constructs that were hogging execution time, newer tools can look for power hogs.
Doesn’t observing more variables mean more overhead? Most IP designs today are instrumented with performance counters, allowing sampling utilities to quickly grab snapshots. Sampling is best for a statistical overview of what is happening, not a detailed sequence of specific events. For more in depth analysis, tracing performs consistent logging of system and user application events with time stamps, without blowing up overhead.
Tracing operates all the way from the hardware performance counters up to full application code, allowing not only a view of what is happening, but exactly why. Analysis based on tracing suggests how to improve the design. For instance, applying kernel tracing – even without any intent of debugging or modifying kernel code – can show how the system interacts with the kernel. User application space tracing can expose issues such as calls to pre-packaged libraries where no source is available.
The conclusion is tracing scenarios in multicore designs need to consider mixed domain data – a combination of elements grabbed from hardware and software, in kernel and application space. Rather than doing orthogonal analysis and trying to connect the dots, tools can provide correlated data from all these domains and illustrate cause and effect. A key here is support for the LTTng open source tracing framework for Linux.
For the complete text of the white paper (one-time Mentor registration required), visit:
Development of Complex Multicore Systems: Tracing Challenges and Concepts (Part One)
Part Two of the white paper series goes deeper into a tracing cycle and use of Mentor Embedded Sourcery Analyzer tools to explore these tracing concepts. Pawan Fangaria has further analysis:
Share this post via:
The Intel Common Platform Foundry Alliance