Cortex-M7: 6-stage, cached, 400 MHz MCU

Cortex-M7: 6-stage, cached, 400 MHz MCU
by Don Dingee on 09-30-2014 at 7:00 am

“Who needs a 32-bit MCU?” It was a question asked a million times in the press when ARM introduced the Cortex-M family back in 2004. In fairness, that question predates the Internet of Things, with wireless sensor networks, open source code, encryption, and more needs for connected devices.

In the beginning, it was about matching the MCU incumbents – 8051, AVR, HC11, MSP430, PIC, and others. The discussion always seemed to be centered on package pin counts, and sleep currents, and less than $1 in volume pricing, and code efficiency. Nobody would ever need 32-bit address space, or faster memory, or floating point, or really fast cores, especially if they drove power consumption the wrong direction.

Now, all that seems silly. Every tech outlet in the world is writing about the ARM Cortex-M7 announcement and what it is going to do for the IoT. This doesn’t wipe out billions of other MCUs, but it does take MCUs into new territory.

It should have been a foregone conclusion that the roadmap for 32-bit would extend far beyond any 8- or 16-bit MCU architecture. Process shrinks over time would bring more performance in the same space and power. Bigger cores borrowing technology from the ARM Cortex-A and Cortex-R families were bound to show up.

It also should have been very obvious that high-level languages and standardized operating systems would become the norm. These obliterate the need for arcane machine-specific code and hand-stitching to wring every last byte out of an implementation. Software has become much more important than hardware, and reusability outweighs optimization.

So it isn’t surprising that ARM’s next 32-bit MCU move is to pick up where the Cortex-M4F left off, improving processing performance while still maintaining low power operation. By now, folks have likely seen the interior shot of the Cortex-M7:

Lurking in the CPU box is a 6-stage, superscalar pipeline that ARM says is capable of 400 MHz in a 40nm process. This compares to a 3-stage pipeline in the Cortex-M4. Welcome to the world of out-of-order execution and optimizing compilers. We’re not likely to see 400 MHz right out of the gate since most MCU vendors have integrated mixed signal to worry about and will be using more mature nodes.

Those boxes labeled TCM stand for “tightly-coupled memory”, a low-latency interface that enables real-time response without penalty of cache misses in some configurations. We’ll also note there is optional data and instruction cache (up to 64KB), which raises a whole bunch of questions about snooping and coherency that MCU devotees have not had to worry about until now. Few technical details exist on the ARM site at this point.

The other options are impressive, with many lifted from higher-performance ARM familes. Double-precision floating point appears for the first time, compared to single-point only in the Cortex-M4. A memory protection unit, a poor-man’s MMU, is available with 8 or 16 regions. The Embedded Trace Macrocell (ETM) offers improved debug capability (probably a new Keil toolset and others like IAR supporting the Cortex-M7 shortly). There is also a hint of safety-critical, with an optional safety package – again, few details exist, but we can see the ECC box.

One note of intrigue: with the options in place, the specs of the Cortex-M7 are getting dangerously close to those of the Cortex-R4 (8-stage dual issue, cache, 64-bit AXI, TCM, DP FPU, ECC). There has not been an enhancement to the Cortex-R family in three years. It leads me to wonder if the overwhelming popularity of the Cortex-M and its roadmap and all the vendors and tools supporting it will just keep pushing up and eventually displace the Cortex-R. (Or, if ARM will unveil something this week.)

While I’m at prognostication, we are very likely to see an Atmel announcement at ARM TechCon in the next couple days on their first Cortex-M7 part and everything that goes with it. Even if I had been briefed (which I wasn’t), I couldn’t divulge specifics in honoring an embargo. But, I’m on the hook for a September deadline.

Here are features to look for, based partly on watching the first competitive announcements and partly on where Atmel is headed:

  • A 200-250 MHz core, likely on 65nm
  • Small caches to provide some boost without huge miss penalties
  • Use of the TCM feature
  • Dual-precision FPU
  • Emphasis on the ETM for debug, and support in Atmel Studio
  • A crossbar similar to the SAM D21 architecture

The exact mix of peripherals remains to be seen. What shows up on chip for the IoT and wearables will be key; it would make sense to see something like Bluetooth LE integrated in some of the versions, leveraging the Newport Media acquisition. I’d also expect some kind of Arduino board announcement, for maker’s sake, as well as Thread protocol support.

Again, zero inside info here – I may be wrong, for all you know, but I may be right.

Related articles:

0 Replies to “Cortex-M7: 6-stage, cached, 400 MHz MCU”

You must register or log in to view/post comments.