On Sep 20th, Synopsys announced an expansion of its DesignWare® ARC® Processor IP portfolio with new 128-bit ARC VPX2 and 256-bit ARC VPX3 DSP Processors targeting low-power embedded SoCs. In 2019, the company had launched a 512-bit ARC VPX5 DSP processor for high-performance signal processing SoCs. Due to the length, format and style, press releases are limited in what they capture. Typically, there is a story behind every product announcement. Learning this story gives us better insights into the announced products.
In order to gain these insights, I had a meeting with Matt Gutierrez, Sr. Director of Marketing, Synopsys Processor Solutions, Markus Willems, Sr. Product Marketing Manager, ARC VPX DSP Processors. This blog is a synthesis of what we discussed.
Unwavering Focus on Embedded Applications
One thing that has been steady and constant from the 1990s to today is ARC technology’s focus on supporting embedded applications. Historically, ARC processors did not target the mobile applications processor segment. The markets for embedded applications have been evolving and ARC processor technology has been transforming accordingly. ARC processors have moved up from being used for just simple and dedicated tasks such as power management to even running 64-bit Linux operating system.
After becoming part of Synopsys in 2010, the burgeoning IoT market gave impetus to build a new generation of embedded ARC processors. A family of very small, highly efficient, low-power processors was need to support the IoT market. A new architecture and ISA were born. A couple of processors were developed and marketed. Early IoT devices needed minimum amount of DSP capabilities. Some DSP functions were added to the processors to support the IoT requirements.
Fast forward to today, Synopsys offers five different ARC product families, each with extensive lineups. Each product family of embedded processors addresses the varying and tight requirements of a broad range of applications. The current announcement is about their VPX DSP family of processors for Language processing, Radar/LiDAR, Sensor Fusion and High-end IoT applications.
Focus Drives Highly Efficient ARC Architecture
The instruction set architecture (ISA) has been designed with the embedded market in mind. For example, unique instructions such as compare & control transfer and branch & loop make it easy to efficiently implement common embedded program behaviors. Another example is 16-bit encodings for popular 32-bit instructions. The ARC ISA has many such features for reducing code size as memory space is at a premium on embedded devices.
Every microarchitectural decision is also made with the embedded market in mind. For example, built-in shadow registers are important for real-time embedded applications to enable fast context switching. These kinds of architectural decisions make a big difference for embedded applications. Something not easily replicated by taking a processor designed for some other applications and tweaking it to support embedded applications.
Other important aspects of ARC’s value proposition are the configurability of the design and the extensibility of the instruction set. Configurability enables implementing just the minimum hardware that is needed for a SoC and nothing more. Extensibility enables adding custom instructions to accelerate application code, increase code density and reduce power consumption.
Customers are effectively able to create customized processor hardware, supported by a singular, standard MetaWare toolchain, that delivers the optimal PPA and code density for their application needs. The majority of ARC customers extend the instruction set by adding custom instructions for their specific algorithms.
Addressing Expanding Market Requirements
Until the introduction of the VPX family of processors, ARC processors could be categorized as Big CPU, little DSP IP solutions. Embedded workloads such as IoT sensor fusion, Radar and LiDAR processing, voice/speech recognition, and natural language processing call for full-fledged DSP capabilities. As Synopsys saw this rising market need, they launched the VPX line of processors, which uses an extended ARC ISA to implement highly vectorized DSPs.
Product Requirements for these Markets
Floating point support is becoming more important for signal processing applications. The data processing algorithms being developed for these markets use floating point to support a wide dynamic range. Staying in floating point instead of converting to fixed point makes mapping an algorithm to a design architecture quicker. The DSP libraries and linear algebra libraries that are supporting these applications are represented in floating point format. Strong support for programming with vector floating point operations is becoming more of a requirement than in the past.
Efficient execution of AI algorithms is another must-have for any modern DSP. This implies support for short Integer datatypes such as Int8, combined with a dedicated programming environment that allows for a smooth mapping of graphs to the DSP architecture. And of course, the DSP has to come with a rich library of machine learning kernels optimized for the hardware to ease software development.
A dedicated hardware accelerator for linear and non-linear algebra operations significantly speeds up these increasingly used math functions.
Configurability, extensibility and scalability are becoming key requirements as product companies start offering multiple variants. Each variant may be optimized differently for PPA and code density.
VPX Family of DSP IP
With the availability of three different VPX families representing 7 different DSPs, customers now have greater flexibility for implementing specific application requirements. The latest two additions are based on the same VLIW/SIMD architecture as the higher-performance 512-bit ARC VPX5 DSP processor launched two years ago. As the new additions target low-power embedded SoCs, they are designed for smaller vector lengths, resulting in smaller, lower power footprints. As ultra-high floating-point performance is a focus for the VPX DSPs, a Vector Floating Point Unit (VFPU) is offered as an option. The VFPU is implemented with multiple pipelines capable of executing up to 512 FLOPs per clock cycle. Along with the launch of the two new additions, Synopsys has also announced some enhancements to the VPX5 processor.
Easy Migration and Scalability of Products
The ARC VPX processors are supported by the Synopsys ARC MetaWare Development Toolkit, which provides a vector length-agnostic (VLA) software programming model. From a programming perspective, the vector length is identified as “n” and the value for n is specified in a define statement. The MetaWare compiler does the mapping and picks the right set of software libraries for compilation. The compiler also provides an auto-vectorization feature which transforms sequential code into vector operations for maximum throughput.
In combination with the DSP, machine learning and linear algebra function software libraries, the MetaWare Development Toolkit delivers a comprehensive programming environment.
Together, the above features enable customers to easily migrate and/or scale their products across all members of the VPX family.
Opportunity for Optimizing Current ARC VPX5-based Designs
In all the talk about VPX2 and VPX3 in the press announcement, mention of the VPX5 enhancements may have gotten lost. Refer to Figure below.
The VPX5 enhancements include double-wide vector load/store, wider AXI interfaces, ISA extensions, and machine learning, DSP and linear algebra libraries that support a VLA-based programming model. These enhancements have enabled VPX5 (as well) to double its performance compared to the earlier version for common DSP functions such as FFT, dot product and windowing. In many applications, this removes the need for designers to implement a separate external accelerator for these functions.
For the Automotive Market
To satisfy the enhanced safety requirements of the automotive market, Synopsys offers a functional safety (FS) series for their entire portfolio including the VPX family of processors. The FS series of processors meet random fault detection and systematic functional safety development flow requirements for full ISO 26262 compliance up to ASIL D.
Delivering design efficiencies, optimizing for PPA and maximizing software code density are at the root of what ARC is about. Synopsys’ ARC VPX DSP family of processors provides customers with a full range of scalable solutions to address their varying requirements.
Also read:Share this post via: