WP_Term Object
(
    [term_id] => 13
    [name] => Arm
    [slug] => arm
    [term_group] => 0
    [term_taxonomy_id] => 13
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 392
    [filter] => raw
    [cat_ID] => 13
    [category_count] => 392
    [category_description] => 
    [cat_name] => Arm
    [category_nicename] => arm
    [category_parent] => 178
)
            
Mobile Unleashed Banner SemiWiki
WP_Term Object
(
    [term_id] => 13
    [name] => Arm
    [slug] => arm
    [term_group] => 0
    [term_taxonomy_id] => 13
    [taxonomy] => category
    [description] => 
    [parent] => 178
    [count] => 392
    [filter] => raw
    [cat_ID] => 13
    [category_count] => 392
    [category_description] => 
    [cat_name] => Arm
    [category_nicename] => arm
    [category_parent] => 178
)

ARMing AI/ML

ARMing AI/ML
by Bernard Murphy on 03-24-2017 at 7:00 am

There is huge momentum building behind AI, machine learning (ML) and deep learning; unsurprisingly ARM has been busy preparing their own contribution to this space. They announced this week a new multi-core micro-architecture called DynamIQ, covering all Cortex-A processors, whose purpose is in their words, “to redefine flexible multicore and heterogeneous compute to enhance the experience of diverse, increasingly intelligent devices from chip to cloud”. What looks particularly important about this is support for heterogeneous clusters, including support for CPUs in the cluster which may not be from ARM, also faster links to the accelerators increasingly found in AI applications.

This raises several interesting thoughts. Perhaps one might be how does ARM play in this space – isn’t all this stuff specialized to neural nets? Some of it certainly is, but not all of it. Non-neural ML usually runs on CPUs. And the learning part of deep-learning typically runs on GPUs, where NVIDIA has a strong start, but I wouldn’t be surprised if system design teams are thinking about working with ARM on Mali-based solutions.

Equally, at least to this reviewer, whatever AI/ML solutions you use, these things can’t exist in isolation. Particularly in mobile and IoT applications (and maybe increasingly in HPC), they must drop into an existing infrastructure which already provides must-have functionality in power management, security, wire or wireless communication, embedded software and debug, cloud access, provisioning, etc, etc. In other words, the ARM ecosystem.

Advanced AI capabilities won’t be compelling on edge nodes unless they are high performance and low power, and are safe, private and secure. I’ve talked about vision and speech recognition moving to the edge because it is too expensive, in power and in latency, to perform those functions through round-trips to the cloud (and may be impossible at times if a wireless connection is not available). It’s a no-brainer that privacy, security and safety are best managed locally with minimal or no need for off-device communication. Which means you need yet more complex/intelligent software running with low latency on energy-sipping devices already setup to manage those other needs.

To get improved latency, ARM believes that heterogeneous compute engines and accelerators need high-performance access to the compute cluster. This a part of what DynamIQ offers. It starts with a ground-up re-design for multi-core which they present as an evolution of big.LITTLE. All Cortex-A cores will be upgraded to this new (V8.2) architecture as will CoreLink. These will be backward-compatible with software and other systems, although existing Cortex-A cores/CoreLink will not be forward-compatible. A cluster can support up to 8 cores, and these can be heterogeneous and non-ARM, as long as they comply with open 8.2 standards. The new Cortex-A cores include multiple improvements for floating-point and matrix multiply. They didn’t get deep into the nature of the improvements in the press briefing but they did say these especially target improving AI performance on ARM-based designs by 50x.

In power management, DyamIQ provides fine-grained speed control in the CPUs, more control over power-state switching (every core in a cluster can operate in a different power state) and autonomous power management for CPU memories.

And in safety, ARM is already well-established in ASIL-D industrial and automotive applications. I would guess that getting ASIL-D signoff on that great new vision recognition sub-system will be greatly eased by integration into DynamIQ.

While I’ve stressed edge applications in this article, ARM also expects significant value for DynamIQ in server/cloud applications. Apparently, ISA enhancements have been developed in close cooperation with important partners specifically to help in this domain. ARM also expects the platform will be very attractive in networking applications thanks to reduced latency within in 8-core clusters.

When are we likely to see this technology, both in access to design teams and appearing in end-user products? ARM told us that multiple early access partners are already working with the technology. I got a mixed message (possibly I wasn’t listening carefully enough) on when the rest of us may get access or see products. I think I heard we would start to see products in 2018 and we may see access to the new standard and to technology sometime this year. But don’t take my word for that – check with your ARM rep.

You can learn a little more about DynamIQ HERE.

More articles by Bernard…

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.