How to prevent execution surprises for Cortex-M7 MCU?

How to prevent execution surprises for Cortex-M7 MCU?
by Eric Esteve on 08-06-2015 at 11:00 am

ARM Cortex-A series processor core (A57, A53) are well known in the high performance market segments, like Application Processor for smartphone, Set-Top-Box or networking. If you look at the electronic market you realize that multiple applications are cost sensitive and doesn’t need such high performance processor core. We may call it the embedded market, even if this definition is vague. The ARM Cortex-M family has been developed to address these numerous market segments, starting with the Cortex-M0 for lowest cost, the Cortex-M3 for best power/performance balance, and the Cortex-M4 for applications requiring digital signal processing (DSP) capabilities.

For the audio, voice control, object recognition, and complex sensor fusion of automotive and higher-end Internet of Things (IoT) sensing, where complex algorithms for audio and video are needed for rich audio and visual capabilities, Cortex-M7 is required. ARM Ltd. offers the processor core as well as the Tightly Coupled Memory (TCM) architecture, but ARM licensee like Atmel has to implement memories in such a way that the user can take full benefit from the M-7 core to meet system performance and latency goals.

In a 65nm embedded Flash process device, the Cortex-M7 can achieve a 1500 CoreMark score while running at 300 MHz, offering top class DSP performance: double-precision floating-point unit and a double-issue instruction pipeline. But algorithms like FIR, FFT or Biquad need to run as deterministically as possible for real-time response or seamless audio and video performance. How to best select and implement the memories needed to support such performance? If you select Flash, this will require caching (as Flash is too slow) leading to cache miss risk. SRAM technology is a better choice as it can be easily embedded on-chip and permit random access at the speed of processor.

Peripheral data buffers implemented in general-purpose system SRAM are typically loaded by DMA transfers from system peripherals.

The ability to load from a number of possible sources, however, raises the possibility of unnecessary delays and conflicts by multiple DMAs trying to access the memory at the same time. In a typical example, we might have three different entities vying for DMA access to the SRAM: the processor (64-bit access, requesting 128 bits for this example) and two separate peripheral DMA requests (DMA0 and DMA1, 32-bit access each). Atmel has get round this issue by organizing the SRAM into several banks as described in this picture:

For chip maker designing microcontroller, licensing ARM Cortex-M processor core provides numerous advantages. The very first is ubiquity of ARM core architecture, being adopted in multiple market segments to support variety of applications. If this chip maker wants to design-in a new customer, the probability that such OEM has already used ARM based microcontroller is very high, and it’s very important for this OEM to be able to reuse existing code (we know the heavy weight linked with software development, in the 60% to 70% of the overall project cost). But this ubiquity generates a challenge: how to differentiate from the competition when competitors can license exactly the same processor core?

Selecting a more aggressive technology node, providing better performance at lower cost is one option, but we understand that this advantage can disappear as soon as the competition also move to this node. Integrating larger amount of Flash is another option, very efficient if the product is designed on a technology allowing to keep the pricing low enough.

If the chip maker has designed on an aggressive technology node, allowing providing higher performance and offering larger amount of Flash than the competition, it may be enough differentiation. Completing with the design of a smarter memory architecture unencumbered by cache misses, interrupts, context swaps, and other execution surprises that work against deterministic timing allow bringing strong differentiation.

If you want to more completely understand how Atmel has designed this SMART memory architecture for the Cortex-M7, I encourage you to read this white paper from Jacko Wilbrink and Lionel Perdigon “Run Blazingly Fast Algorithms with Cortex-M7 Tightly Coupled Memories”. (You will have to register).

This paper describe MCUs integrating an SRAM organized into four banks that can be used as general SRAM and for TCM, showing one example of a Cortex-M7 MCU being implemented in the Atmel® | SMART SAM S70, SAM E70, and SAM V70/1 families.

By Eric Esteve from IPNEST

More products and design kit on Atmel Sales portal: