webinar IPXACT banner

MIPS CPU and Newton2 Platform for Wearables

MIPS CPU and Newton2 Platform for Wearables
by Eric Esteve on 11-19-2014 at 11:00 am

I have written recently about SmarCoT (Smart Connected Thing) and smartwatches are one of these numerous smart and connected applications that some still refer to as IoT. Imagination Technologies is working hard to be part of the SmarCoT ecosystem and Ingenic, IMG customer, has recently launched a MIPS-based chip (M200) and Newton2 platform addressing high end applications like wearable. Ingenic expects MIPS 64-b CPU to be integrated in target applications like:

  • Infotainment: smartwatches, augmented reality headsets, smart glasses, smart cameras
  • Healthcare: wearable healthcare monitors
  • Fitness and wellness: fitness bands, activity trackers, smart clothing, sleep sensors

And such products are likely to address the needs of mid to upper class customers, able to spend a couple of $100’s or more for a gadget. These applications are expected to be a more effective driver for SmarCoT development than your electricity meter, it’s more fashionable, rank you in the early adopters and immediately value the buyer. Just like listening an iPod in the 2000’s.

I have learned from this blog from Imagination Technologies Ingenic introduces MIPS-based M200 chip and Newton2 platform for wearables and IoT that “The new GEAK Watch 2 uses an Ingenic wearable chip and delivers over 15 days of battery life”. Here we come one of the most important factor when dealing with wearable, the power consumption. If the chip maker is responsible for the product power consumption, it’s clearly better to select the right CPU IP core, delivering the best power efficiency and to architect the SoC for ultra-low power from the beginning. IMG has rethink their CPU port-folio in respect with the different market, High-end or Ultra-affordable mobile, Smart TV and STB, Networking and finally wearable, as you can see in the spider diagram below:

Thanks to the spider diagram, the result is clear: the wearable segment can afford with lower performance level and lower features, but need low power. This translate into implementing a power-saving hardware architecture where a high-performance MIPS CPU clocked at 1.2 GHz tackles most of the heavy lifting, while less demanding tasks are handled by a secondary low-power 300 MHz MIPS CPU. The multimedia department sees the addition of a 3D graphics engine that supports OpenGL ES 2.0. M200 also integrates a dedicated, multi-standard video engine for low power decoding and encoding of popular codecs like H.264 and VP8 (up to 720p at 30 fps). The chipset also includes an ISP for image pre-processing that supports a range of vital features for camera vision applications. When in full operating mode, the M200 chip consumes only 150mW and standby power consumption for Newton2 is less than 3mW, allowing devices to work for twice as long.

The above Ingenic M200 chip architecture sounds like an Application Processor design (for late 2000’s smartphone), as it’s pretty complex, integrating the MIPS flavor of big.LITTLE, 512 KB L2 cache, a DDR3/LPDDR2 Memory controller and PHY, a Video PU, a Graphic PU, an ISP, an Audio Codec and plenty of interfaces, MIPI DSI and CSI and an USB OTG, to name a few. Reaching an active power dissipation of 150 mW has certainly been challenging, if you take a look at the complete system, a 15mmx30mm board, it also integrates a PMU IC, a camera IC, a display IC, a GPS and sensor IC, a 9axis gyroscope, a huge eMCP memory chip and a Broadcom WiFi + Bluetooth 4.1 chip!

As far as I am concerned, I have no problem to rank Newton2 within the IoT category, I just would rank it in the high end segment, with smart glasses and the like. Thus, we may expect to see sales reaching (several) million units, but the billion unit step is more questionable. This type of wearable (probably sold several $100) could be the flagship products, driving the mass market customers to the SmarCoT concept (even if they don’t massively buy the above product for cost reason), expecting the end user to massively buy the multiples-and yet to develop- SmarCoT products…

From Eric Esteve from IPNEST

More Articles From Eric Esteve


Atmel, IoT and CryptoAuthentication

Atmel, IoT and CryptoAuthentication
by Paul McLellan on 11-19-2014 at 7:00 am

One of the companies that is best positioned to supply components into the IoT market is Atmel. For the time being most designs will be done using standard components, not doing massive integration on an SoC targeted at a specific market. The biggest issue in the early stage of market development will be working out what the customer wants and so the big premium will be on getting to market early and iterating fast, not premature cost optimization for a market that might not be big enough to support the design/NRE of a custom design.

Atmel has microcontrollers, literally over 500 different flavors and in two families, the AVR family and a broad selection of ARM microcontrollers/processors. They have wireless connectivity. They have strong solutions in security.

Indeed last week at Electronica in Germany they announced the latest product in the SmartConnect family, the SAM W2 module. It is the industry’s first fully-integrated FCC-certified Wi-Fi module with a standalone MCU and hardware security from a single source. The module is tiny, not much larger than a penny. The module includes Atmel’s recently-announced 2.4GHz IEEE 802.11 b/g/n Wi-Fi WINC1500, along with an Atmel | SMART SAM D21 ARM Cortex M0+-based MCU and Atmel’s ATECC108A optimized CryptoAuthentication engine with ultra-secure hardware-based key storage for secure connectivity.


That last item is a key component for many IoT designs. Security is going to be a big thing and with so many well-publicized breaches of software security, the algorithms, and particularly the keys, are moving quickly into hardware. That component, the ATECC108A, provides state-of-the-art hardware security including a full turnkey Elliptic Curve Digital Signature Algorithm (ECDSA) engine using key sizes of 256 or 283 bits – appropriate for modern security environments without the long computation delay typical of software solutions. Access to the device is through a standard I²C Interface at speeds up to 1Mb/sec. It is compatible with standard Serial EEPROM I²C Interface specifications. Compared to software, the device is:

  • higher performance (faster encryption)
  • lower power
  • much harder to compromise

Atmel have a new white paper out, Integrating the Internet of Things, Necessary Building Blocks for Broad Market Adoption. Depending on whose numbers you believe, there will be 50 billion IoT edge devices connected by 2020.


As it says in the white paper:On first inspection, the requirements of an IoT edge device appear to be much the same as any other microcontroller (MCU) based development project. You have one or more sensors that are read by an MCU, the data may then be processed locally prior to sending it off to another application or causing another event to occur such as turning on a motor. However, there are decisions to be made regarding how to communicate with these other applications. Wired, wireless, and power line communication (PLC) are the usual options. But, then you have to consider that many IoT devices are going to be battery powered, which means that their power consumption needs to be kept as low as possible to prolong battery life. The complexities deepen when you consider the security implications of a connected device as well. And that’s not just security of data being transferred, but also ensuring your device can’t be cloned and that it does not allow unauthorized applications to run on it.


For almost any application the building blocks for an IoT edge node are the same:

  • Embedded processing
  • Sensors
  • Connectivity
  • Security
  • and while not really a building block, ultra low power especially for always-on applications

My view is that the biggest of these issues will be security. After all, even though Atmel has hundreds of different microcontrollers and microprocessors, there are plenty of other suppliers. Same goes for connectivity solutions. But strong cryptographhic solutions implemented in hardware are much less common.

The new IoT white paper is available for download here.


More articles by Paul McLellan…


Arteris on a winning streak in 2014

Arteris on a winning streak in 2014
by Don Dingee on 11-19-2014 at 3:00 am

When Arteris sold key network-on-chip intellectual property and most of its human assets to Qualcomm earlier this year, it was big news. We suggested the bigger news after a restaffing effort would be a next-generation NoC release, and a new round of design wins.

Some developments were already in the pipeline. Continue reading “Arteris on a winning streak in 2014”


Simulation and Analysis of Power and Thermal Management Policies

Simulation and Analysis of Power and Thermal Management Policies
by Daniel Payne on 11-18-2014 at 10:00 pm

Earlier this month I blogged about Power Management Policies for Android Devices, so this blog is part two in the series and delves into the details of using ESL-level tools for simulation and analysis. The motivation behind all of this is to optimize a power management system during the early design phase, instead of waiting until RTL or logic synthesis to estimate power. RTL and gate level power simulation is often too late and the simulation speed and complexity is not amenable to dynamic use case simulation with interactive active power management techniques. Docea Power has written a white paper, demonstrations and additional information on modeling power-thermal management policies. To request a demo or evaluation, just send an emailor visit their web site.

The Docea approach to model and simulate a power management algorithm has four concepts:

[LIST=1]

  • A power model describing how each hardware block consumes power.
  • An RC thermal model of your system, where the coupled power and thermal models are solved with Aceplorer.
  • A set of real-life scenarios written as a set of processing tasks mapped to processing units, or as a sequence of steps over time.
  • Your power and thermal management algorithm.

    ​Modeling a Power Management Scheme

    With this approach the solver plays the scenario and stops at each timer tick to evaluate the system, then reads the thermal sensors values. Your specific power management algorithm then determines if the solver should change the scenario resolution using an operating mode defined in your power mode table.

    Consider an example SoC with the following characteristics:

    • Four cores modeled as Processing Units (PU), each having its own frequency and voltage.
    • Three components (peripherals and memory): USB Controller, SRAM, Standard Definition Display DAC
    • Voltage/frequency sources and interconnect

    System-level Power Model Schematic

    The PUs have parameters controlled in the scenario by tasks, where a task has a processing load and priority level. The Aceplorer tool has a scheduler to prioritize each task and then schedule them during simulation.

    Each PU can start traffic on the interconnect and to the memories. For this simple example there is only one memory instance, however you can still model different memory types and optimize the memory configuration.

    Power management algorithms for this SoC include:

    • CPU power modes depend on their use rates and idle residencies
    • Thermal sensor values drive the thermal throttling algorithm
    • If all initiators are idle, then shared resources are set to low power mode

    The algorithm for use-rate based power management in our example is:


    Use rate and idle residency-based power manager

    In a similar fashion we can define our thermal algorithm as:

    For the memory and interconnect blocks we decided that they go to low power modes when all initiators are in idle mode.

    For each core in the processor cluster a power mode table is defined as:

    Timers are defined where the use rate is checked every 20 ms and the temperature every 10 ms.

    Scenarios are modeled visually in Aceplorer and they have a sequence of tasks where each is mapped to a core (PU). Here’s the initialization step followed by four tasks, each running on its own core:

    You can simulate and compare various power management schemes to see which is more optimal for lowest power, operation within a thermal envelope, best throughput and tradeoffs between battery life, performance and ergonomic, safety, reliability or thermal behavior.

    For comparison, four simulations were generated: a baseline cases Unconstrained: 1.2 GHz (maximum operating frequency) @ high voltage (1.2 volt), with No Power management. Three Power-Thermal management policies: On-demand governor, Hot-plug governor and Thermal management policies. The power for the unconstrained case and each power management simulation is summarized below:

    In the unconstrained case the processing unit core cluster junction temperature could exceed the desired reliability limit resulting in thermal runaway or shut down after a prolonged period.


    CPU Core cluster junction temp (blue) and power (red)

    To evaluate task consumption, which is a proxy for application performance, we can compare the task execution time for each policy: thermal management, on-demand and hot plug power-thermal management schemes.

    In the thermal management algorithm a simple frequency control is used. In the on demand policy model frequency control and clock gating are used. In the hot plug scheme we also power gate cores if the idle residency is > 10 ms. In the thermal management policy we reduce frequency if the temp exceeds 80 C and increase frequency when the junction temp is less than 75C for each core. A broader range and other thresholds can also be applied for sensitivity analysis.

    Use-rate, idle residency power management such as on-demand or hot plug algorithms can maintain a thermal envelope and may also increase low power state residency to buy back thermal headroom. The on-demand and hot plug policies may provide an additional advantage of providing better applications specific performance than simple DFVS or a conservative fixed operating point.

    The figure below shows some of the different reports profiles obtained using the two power management algorithms on one of the processing units (core1). Similar differences are seen on the other cores for these types of simulations.

    Conclusions:

    While it may be difficult, it is quite important to evaluate Power Thermal Management (PTM) strategies early in the product development process. It is critical to analyze realistic use case scenarios with fast simulation speed and configurability. Getting meaningful results at a reasonable simulation time is the name of the game:

    • In the early stages of design different HW choices and SW policies may apply: exploration and what if analysis will enable you to optimize the hardware (thermal aware floor planning, power intent definition, packaging and assembly) and the software (governors, drivers).
    • There are a large number of use cases and corner cases to validate before delivering the code to a customer (different ambient temperatures, different process corners, different packaging enclosures)
    • Software changes may occur at a much faster pace and at different development timeline than the hardware. Simulations running for a few hours or days on emulators and low level design tools may be unacceptable to SW developers.

    What deters current solutions from obtaining reasonable Power thermal management simulations speed?

    • Power thermal management policy development should include thermal monitor and sensor data. Computational Fluid Dynamic (CFD) models are large and take too long to run dynamic use cases driven by software execution.
    • Virtual Platforms do not have temperature feedback which is crucial to get any meaningful leakage results for demanding use cases on modern SoCs.

    Why is it possible with the Docea approach?

    • The power behavior of an IP is described in a model-based approach. All the power related information is available: voltage tree, functional clock tree, power states for IPs, current consumption dependencies to temperature, operating points, activity, load or traffic. Even non-linear behavior of voltage regulators efficiencies can be modeled.
    • The thermal model: Docea proposes a solution for automatic generation of compact RC thermal networks that are multi-source that can represent the multi-layer stack and the complete assembled module or system (multiple chips on a board or a complete phone).
    • Docea Power’s solver takes into account the coupling between power and thermals. As many thermal management decisions are taken based on the leakage power consumed at a given time, this coupling is a must have for any realistic PTM simulation.
    • PTM strategies are algorithms describing when and how operating conditions (frequencies and voltages) change given the state of the system (use rate of processing cores, temperature of the die or of the case). Scenarios can be described as a sequence of processing loads. The processing depends on the operating conditions and the strategies.

  • Intel Quits Mobile

    Intel Quits Mobile
    by Paul McLellan on 11-18-2014 at 5:00 pm

    It happened today. As I have predicted for over a year, Intel would not be successful in mobile and would be forced to exit the market. Last quarter they lost $1B on revenues of $1M (as Dave Barry would say, I am not making this up, that M is not a typo). They ship “contra revenue” with their chips for the tablet market, meaning that instead of just showing a huge loss on a reasonable amount of revenue, they take the marketing subsidy off the revenue line meaning that there is basically no revenue.

    There has been no formal announcement but an internal Intel email from CEO Krzanich has been leaked that early next year Intel will merge its mobile group into its PC client group. My reading is that they have given up on mobile and are putting tablets together with client PCs as a single business which will be known as the Client Computing Group. They still have their LTE modem unit which they will continue with and technically they still have a group focused on mobile. But it is clear that their heart is no longer in it. ARM and Apple and Qualcomm and Mediatek have won.

    Intel spokesman Chuck Mulloy put some spin on it:The lines are blurring between PCs, tablets, phablets and phones. The idea is to accelerate the implementation and create some efficiency so that we can move even faster.

    At least for tablets and notebooks this arguably true. But the biggest mobile market, chips for smartphones, doesn’t really have any synergy at all with the PC market. Nobody cares about x86 compatibility, if they care about any compatibility it is with the ARM instruction set. Intel has an ARM license, I assume, from when they acquired Infineon’s wireless unit. And they certainly have one to be able to manufacture Altera FPGAs with embedded ARMs. But Intel’s strategy in mobile has never been to use their manufacturing to compete in the ARM-based application processor market, it has always been to leverage the x86 instruction set and their Atom family to get an “unfair share” of the market. Today it became clear even to them that this was not going to happen.

    The tablet market isn’t really even growing. Even Apple’s iPad sales are flat. It was a market that appeared out of nowhere (and I confess I was one of the people who was dubious when the iPad was announced) and just a few years later is already commoditized. In the smartphone market that is happening too. The winners in the latest market share data are all Chinese manufacturers at very low price points. Apple is making all the profits but its market share is declining slowly (not last quarter when it announced new models but over the last couple of years).

    Samsung is also struggling in mobile and has said that it will transfer a lot of engineering effort away from smartphones, where it is still #1 but with declining share and even faster declining profitability, towards sectors where it anticipates growth.

    We live in interesting times!

    Also Read:


    Intel Results: Spectacular PC, Some Progress in Mobile


    The Two Biggest Misses in Mobile


    Xiaomi Already #3 in Smartphones Behind Samsung and Apple


    More articles by Paul McLellan…


    Make Semiconductor IP Reuse Successful?

    Make Semiconductor IP Reuse Successful?
    by Daniel Nenni on 11-18-2014 at 3:00 pm

    As I have mentioned before, Apple has changed the way we live on many different levels (iPod, iTunes, iPhone, iPad, etc…) and the Apple Ax SoC series is no different. You have to ask yourself how is Apple able to churn out a new industry leading SoC EVERY year? I can assure you design reuse is a big part of that answer.

    One of the companies I have had the pleasure of working closely with on SemiWiki since we started 3+ years ago (has it really been that long?) is ClioSoft. You can check their SemiWiki landing page HERE. One of the things I greatly appreciate about ClioSoft is their generosity and this is yet another example.

     

    Just in time for Christmas, ClioSoft will be giving away $25 Amazon gift cards and some amazing prizes to semiconductor professionals that participate in a design reuse survey of sorts. ClioSoft would like your input on what YOU think needs to be done to make IP reuse work and why it needs to be done. What problems have you run into or heard about? What do you think should be done to make design reuse work more efficiently? A simple paragraph will do. They even provide some examples from ClioSoft customers:

    “During IP selection, I should be able to quickly determine the quality of an IP very easily. To gain better confidence, I should also be able to check its lineage to see the SoCs which have taped out using the IP. An insight into the open issues against an IP would be a definite plus.”

    “One of the biggest challenges with using different models of an IP is to ensure all members of your design team – software, verification, RTL developers, physical implementation – are using the same version of the IP. From an SoC integration standpoint, it would be useful to recieve updates on an IP and based on the issues fixed, determine whether the new update of the IP should be incorporated into the SoC design.“

    “I think we need to move beyond the current definition of an IP. We should consider existing flow methodologies, regression methods, scripts as IPs as well. If someone in the company has created a working flow for a 16nm technology, why not leverage it in the other design groups.”

    What you need to do to win!
    It is quite simple really!! What problems have you run into or heard about? What do you anticipate? Tell us in your own words what would make design reuse easier and more effective within a company and you will automatically be entered into ALL weekly drawings for a prize. To inspire you to write, we have listed above, some of the things our customers have told us. And to motivate you, we are even offering a $25 Amazon Gift Card for the first 25 valid entries.

    It really is that simple, just enter your response in 200 words or less and get a chance to win one of the prizes pictured above. The contest ends December 22nd so do not delay. SUBMIT YOUR ENTRY HERE!

    Also Read

    Design Collaboration across Multiple Sites

    Webinar: Collaboration Within Dispersed Design Teams

    Leveraging Design Team Energy!


    Invionics: a New EDA Company is Born

    Invionics: a New EDA Company is Born
    by Paul McLellan on 11-18-2014 at 7:00 am

    Something happened this morning that doesn’t happen too much any more: a new EDA company came out of stealth mode and announced its product line. The company is Invionics, and they are based in Vancouver BC. The CEO is Brad Quinton who largely funded the company himself, although with some grants from the Canadian government. He previously created Veridae systems which he then sold to Tektronix (who have since sold it on to Mentor). The core team at Invionics are the original team from Veridae.


    So what does Invionics do? They are addressing a problem that SoC companies have, which is that it is hard to differentiate yourself from your competition when everyone is using the same tool flows from the same 3 companies, and using the same IP. One way to differentiate is to improve the design process with internal tools that result in more optimal designs. Invionics have a platform, the Invio platform, that makes this straighforward. It contains all the parsers and graphical widgets necessary to make building specialized tools quickly. For now at least it is all focused on the front-end, not back-end layout. So they are not really building tools, they are delivering a platform for you to build your own tools.


    For example, the above little piece of code changes synchronous resets to asynchronous (or checks that all resets are correctly specified as asynchronous in the RTL, which is another way of saying the same thing).


    The Invio platform plugs in at different points in the design flow depending on what optimization the customer’s tool is intended for, from custom Lint and IP instantiation at the front end to physical partitioning and floorplanning at the back end.

    The key features of the Invio platform are:

    • easy to use Tcl or Python API
    • Verilog, AMS, SystemVerilog (even non-synthesizable)
    • custom GUI builder
    • application packager
    • HDL language agnostic
    • specific modules for RTL modification, netlist modification, verification, SoC assembly

    They already have a number of customers that they are working with, although as is so often the case, most of them aren’t ready to go on the record. One that is Maxim Integrated who are using Invio to extract information from complex SystemVerilog verification environments.

    As Brad, the CEO, summarized it in the press release:
    The Invio platform was developed through collaboration with some of the industry’s leading semiconductor companies, which are facing ever increasing pressures to differentiate their designs in shrinking time to market windows. Using Invio, customers have been able to build in-house EDA tools in a fraction of the time typically required, thereby creating highly customized IC design processes.

    Invio is available now. More details on Invionics website here.


    More articles by Paul McLellan…


    IEDM 2014 Preview

    IEDM 2014 Preview
    by Scotten Jones on 11-17-2014 at 8:00 pm

    The International Electron Devices Meeting (IEDM) is one of the premier conferences for the presentation of the latest semiconductor processes and process technologies. IEDM is held every year in December alternating between San Francisco and Washington DC. This year IEDM will be held at the San Francisco Hilton on December 15[SUP]th[/SUP], 16[SUP]th[/SUP] and 17[SUP]th[/SUP].

    My company, IC Knowledge LLC produces the most widely used semiconductor and MEMS cost models in the world today. One of the keys to accurate modeling is building process flows that represent the latest processes. IEDM is a key conference for me to keep up with the state-of-the-art and I will be in San Francisco this year.

    The conference preliminary schedule is out and I wanted to highlight what I thought were some of the most interesting papers.

    Typically my favorite session of the conference each year is the “Advanced CMOS Technology Platform” session. This is a session where the major logic producers square off against each other for bragging rights to the highest performance and or densest processes. This year’s session will be held Monday afternoon and looks really strong. TSMC is scheduled to present what looks to be their 16FF+ process, Intel will present their 14nm FinFET process and IBM will be presenting their 14nm FinFET on SOI process. This session should give a clearer picture of how the major competing 14/16nm FinFET platforms compare.

    Tuesday morning more advanced CMOS technology will be presented in the “Advanced CMOS Devices for 10nm Node and Beyond” session. ST Micro with CEA-LETI, IBM and SOITEC will present work on a 10nm FDSOI process.

    Tuesday afternoon has a session on “Nano Device Technology – Ge and SiGe Transistors” that will feature additional work from IBM/Global Foundries and ST Micro, CEA-LETI and SOITEC with SiGe PMOS devices on SOI and TSMC’s work on Ge FinFETs. Also on Tuesday afternoon the “Memory Technology – MRAM, DRAM and NAND” session has a Samsung paper on emerging SST-MRAM although as is often the case at IEDM, it is at the same time as the ST Micro, CEA-LETI and SOITEC paper. IBM will also present their 22nm embedded DRAM technology in this session. One disappointment for me the last several years that continues this year is the leading memory providers no longer present their latest NAND and DRAM processes.

    Wednesday morning the “Power and Compound Semiconductor Devices – III-V for Logic” session will showcase the latest results in III-V devices for logic.

    Wednesday afternoon will see the “Process and Manufacturing Technology – Advanced Process Modules” session with an IBM paper on 10nm and below interconnect.

    These are just some of the key papers from my perspective that have caught my eye. There is a growing number of MEMS papers at IEDM and I will be filling in with a number of MEMS papers for my MEMS modeling work.

    After IEDM is over I will follow this up with a blog describing the papers.

    About IEDM
    With a history stretching back 60 years, the IEEE International Electron Devices Meeting (IEDM) is the world’s pre-eminent forum for reporting technological breakthroughs in the areas of semiconductor and electronic device technology, design, manufacturing, physics, and modeling. IEDM is the flagship conference for nanometer-scale CMOS transistor technology, advanced memory, displays, sensors, MEMS devices, novel quantum and nano-scale devices and phenomenology, optoelectronics, devices for power and energy harvesting, high-speed devices, as well as process technology and device modeling and simulation. The conference scope not only encompasses devices in silicon, compound and organic semiconductors, but also in emerging material systems. IEDM is truly an international conference, with strong representation from speakers from around the globe.

    In 2014 there is an increased emphasis on circuit and process technology interaction, energy harvesting, bio-snesors and bioMEMS, power devices, magnetics and spintronics, two dimensional electronics and devices for non-Boolean computing.


    Analyzing Cortex-A53 octa-core on Linux

    Analyzing Cortex-A53 octa-core on Linux
    by Don Dingee on 11-17-2014 at 3:00 pm

    Octa-core sells smartphones and tablets. 64-bit ARM Cortex-A53 implementations are available from Huawei, MediaTek, Qualcomm, Samsung, and now Marvell, with Rockchip and others on the way. Suddenly, almost everyone planning to run Linux is being asked for octa-core designs.

    If it were easy, anyone could do it. Increasing the number of cores also increases the number of things that can go wrong in a busy system, limiting performance of individual cores forced to wait around for something else to happen. However, what might seem like optimizing an SoC for processor core performance can completely blow up power consumption and design and IP costs. A Cortex-A53 is a terrible thing to waste.

    Fortunately, this is the exact job virtual prototyping was born to do. Tools such as QEMU are helpful for software development, and RTL simulation and emulation helps wring problems out of hardware. Getting to the heart of an octa-core Cortex-A53 design requires implementation-accurate virtual prototyping, able to deal with both hardware and software aspects of analysis.

    Instruction-accurate models are fabulous for software debug, but mostly leave timing considerations out of the equation. They can completely miss issues that arise between IP blocks with concurrent system activity, such as an operating system and application code would generate.

    Cycle-accurate models are slow, and booting up an operating system like Linux can literally take days. This is why hardware simulators have speed-rate adapters for external peripherals like USB, SATA, and Ethernet, allowing peripherals to run at least in bursts while the system simulation catches up.

    Carbon Swap ‘n Play does both, going beyond the ARM Fast Model technology. In simple terms, Swap ‘n Play technology boots Linux using an instruction-accurate set of models, then switches to a cycle-accurate set of models at a breakpoint to run the region of interest.

    Building on that idea, Carbon Performance Analysis Kits – CPAKs – bring in both model sets for complex processors and the surrounding IP, including memory controllers, cache coherency units, and interrupt controllers. The latest Carbon release is the Cortex-A53 Multi-Cluster Quad Core Linux Swap n’ Play CPAK.

    Most octa-core SoC implementations are really two quad core clusters bolted together, such that one cluster can run at a different clock frequency, or idle for power consumption savings. This CPAK is designed with two quad-core Cortex-A53 clusters, an ARM CoreLink CCI-400 cache coherent interconnect, a GIC-400 interrupt controller providing interrupts to all cores, and a system counter linked to the Cortex-A53 generic timers.

    A new blog post by Jason Andrews describes the octa-core Cortex-A53 CPAK. He describes many subtle details covered by the models, including these:

    • CLUSTERIDAFF sets the cluster, mapped to the MPIDR register.
    • CNTVALUEB allows the Generic Timers in each Cortex-A53 to have the same values, even if processor frequencies are different.
    • WFE (wait for event) and SEV (send event) are used to coordinate the two clusters during Linux boot using EVENTI and EVENTO signals.
    • SMPEN is set in the CPUECTLR register, a bit missing in the generic Linux boot wrapper that the kernel needs.
    • The Snoop Control Register in the CCI-400 is set to enable coherency, again not in the generic Linux boot wrapper.


    Leveraging a Cortex-A53 CPAK in Carbon SoC Designer Plus for octa-core design puts a powerful tool for Linux performance analysis in the hands of designers.

    Related articles:


    Xilinx Announces SDAccel, Accelerators for the Datacenter

    Xilinx Announces SDAccel, Accelerators for the Datacenter
    by Paul McLellan on 11-17-2014 at 9:00 am

    Today Xilinx announced SDAccel, an initiative for the data-center. This is the second of a series of software defined development initiatives for various markets, the first being SDNet that is targeted at building networking applications. One challenge that a company like Xilinx faces is that as the scale of design move up to entire systems, the more they are dealing with software engineers who know nothing much about hardware. They need a design methodology that is targeted to programmers who don’t even know what RTL is and have no intention of finding out.

    SDAccel is aimed at building co-processors to accelerate certain functions in the data-center such as encryption, search, speech recognition, image recognition and so on. The basic architecture has a board of UltraScale FPGAs communicating with the server CPU cores via PCIe, and with access to memory too. The reason for doing this is that you can get 20-25X improvement in performance per watt compared to using the CPU or GPU directly. Also, 50-75X reduction in latency versus a pure software solution. Since power is the limiting factor in many data-centers this is significant.


    But another feature is required to make this workable, which is that the workload is not fixed and neither are the algorithms. But the FPGAs are so big they contain multiple accelerators which means that run-time reconfigurability is required, meaning that some of the accelerators can be replaced on the FPGA at the same time as other accelerators are in use, and without taking down the always-on interfaces such as ethernet or PCIe. This is totally different from the normal use of an FPGA which is typically configured just once at system boot.


    SDAccel consists of a software development environment for C, C++ and OpenCL. It has x86 emulation, hardware models and so on. Under the hood it uses Xilinx’s high level synthesis (HLS) and the Vivado place and route engine, but that is not talked about since software engineers don’t know anything about that and are scared off by being forced to learn too much about hardware. The idea is to give the engineer the same experience on FPGAs as they are used to in the CPU/GPU environments.

    Xilinx has also worked with partners to create boards that just plug into the server PCIx slots, since it is not reasonably to expect data-center owners to design their own boards. So this is a complete solution. You plug the boards into the servers, use the software development environment to develop and analyze the accelerators, and then during operation they are dynamically loaded into the FPGAs on demand, rather like the paging in virtual memory.


    The results are almost as good as hand-coded RTL which is obviously the golden benchmark. They are also 3-5X better than competitive FPGA OpenCL solutions (and we all know who competitive FPGA means).

    So the summary is that it is a solution that is as easy to program as a CPU or GPU but with much much better performance per watt. SDAccel is available now.


    More information is available here.


    More articles by Paul McLellan…