Analyzing Cortex-A53 octa-core on Linux

Analyzing Cortex-A53 octa-core on Linux
by Don Dingee on 11-17-2014 at 3:00 pm

Octa-core sells smartphones and tablets. 64-bit ARM Cortex-A53 implementations are available from Huawei, MediaTek, Qualcomm, Samsung, and now Marvell, with Rockchip and others on the way. Suddenly, almost everyone planning to run Linux is being asked for octa-core designs.

If it were easy, anyone could do it. Increasing the number of cores also increases the number of things that can go wrong in a busy system, limiting performance of individual cores forced to wait around for something else to happen. However, what might seem like optimizing an SoC for processor core performance can completely blow up power consumption and design and IP costs. A Cortex-A53 is a terrible thing to waste.

Fortunately, this is the exact job virtual prototyping was born to do. Tools such as QEMU are helpful for software development, and RTL simulation and emulation helps wring problems out of hardware. Getting to the heart of an octa-core Cortex-A53 design requires implementation-accurate virtual prototyping, able to deal with both hardware and software aspects of analysis.

Instruction-accurate models are fabulous for software debug, but mostly leave timing considerations out of the equation. They can completely miss issues that arise between IP blocks with concurrent system activity, such as an operating system and application code would generate.

Cycle-accurate models are slow, and booting up an operating system like Linux can literally take days. This is why hardware simulators have speed-rate adapters for external peripherals like USB, SATA, and Ethernet, allowing peripherals to run at least in bursts while the system simulation catches up.

Carbon Swap ‘n Play does both, going beyond the ARM Fast Model technology. In simple terms, Swap ‘n Play technology boots Linux using an instruction-accurate set of models, then switches to a cycle-accurate set of models at a breakpoint to run the region of interest.

Building on that idea, Carbon Performance Analysis Kits – CPAKs – bring in both model sets for complex processors and the surrounding IP, including memory controllers, cache coherency units, and interrupt controllers. The latest Carbon release is the Cortex-A53 Multi-Cluster Quad Core Linux Swap n’ Play CPAK.

Most octa-core SoC implementations are really two quad core clusters bolted together, such that one cluster can run at a different clock frequency, or idle for power consumption savings. This CPAK is designed with two quad-core Cortex-A53 clusters, an ARM CoreLink CCI-400 cache coherent interconnect, a GIC-400 interrupt controller providing interrupts to all cores, and a system counter linked to the Cortex-A53 generic timers.

A new blog post by Jason Andrews describes the octa-core Cortex-A53 CPAK. He describes many subtle details covered by the models, including these:

  • CLUSTERIDAFF sets the cluster, mapped to the MPIDR register.
  • CNTVALUEB allows the Generic Timers in each Cortex-A53 to have the same values, even if processor frequencies are different.
  • WFE (wait for event) and SEV (send event) are used to coordinate the two clusters during Linux boot using EVENTI and EVENTO signals.
  • SMPEN is set in the CPUECTLR register, a bit missing in the generic Linux boot wrapper that the kernel needs.
  • The Snoop Control Register in the CCI-400 is set to enable coherency, again not in the generic Linux boot wrapper.


Leveraging a Cortex-A53 CPAK in Carbon SoC Designer Plus for octa-core design puts a powerful tool for Linux performance analysis in the hands of designers.

Related articles: