With their current line-up of embeddable and discrete FPGA products, Achronix has made a big impact on their markets. They started with their Speedster FPGA standard products, and then essentially created a brand-new market for embeddable FPGA IP cores. They have just announced a new generation of their Speedcore embeddable FPGA IP that targets leading edge compute applications such as AI/ML. More than just being a process node advancement, they have made a number of strategic architectural changes to improve performance and adapt to certain classes of problems.
Yes, as you might expect this announcement includes moving to the latest process node, TSMC 7nm, and there will be a back port to 16nm later in 2019. However, the really interesting stuff in this announcement has to do with further improvements in the already optimized architecture of the fabric.
I had a chance to speak to Robert Blake, Achronix CEO, at the time of the announcement to gain deeper insight into the specifics. He mentioned that they have successful 7nm validation silicon back that meets their target specifications. The motivation for many of the changes in this new generation are based on the AI/ML market and the big changes in how FPGA technology is being used.
FPGAs have made a dramatic shift over the decades from glue logic and interface uses to becoming a major element in data processing, such as networking and AI. Microsoft demonstrated how FPGAs offer huge acceleration for compute intensive applications. Classic CPUs have seen their year-to-year performance gains flatten out. With this there has been a concomitant growth of the use of specialized processors such as GPUs to fill the gap. FPGA’s represent an even more flexible tool for implementing computational processing. Achronix likes to point out that CPUs are rapidly becoming FPGA helpers, that can deal with exceptions, but are not necessarily in the main data path as much anymore.
The beauty of embeddable FPGA fabric IP is that significant overhead of an off-chip resource is avoided. These include off chip driver loads, board real estate, and interface speed limits.
The Speedcore 7t, which is built with their Gen4 architecture, provides significant PPA improvements. Robert told me that they see simultaneous gains in performance, power and area, namely a 60-300% boost in performance and a 50% decrease in power with an area decrease of 65%. Any one of these would be noteworthy, but they have a combined win. Robert walked me through some of the changes that contribute to these numbers.
Based on the needs of several important applications, Achronix has added or enhanced certain logic blocks. For instance, there is an 8-1 mux, which is critical for networking applications. Another is an 8-bit ALU that is heavily used for AI/ML. Robert also talked about their bus max function, dedicated shift registers, and LUT changes, all of which improve the compute power of their FPGA fabric.
Robert talked about numerous other additions, such as their programmable bus routing. This 4-to-1 bus routing capability can be cascaded to create wider busses. This will save LUT resources and offers a 2X performance improvement.
Going one step further, they have added a new compute block – a Machine Learning Processor (MLP). It is optimized for neural network (NN) matrix vector multiplication. It is clocked at 750 MHz and has flexibility in the number formats is can handle: Fixed point, Bfloat16, 16-bit half precision FP, 24-bit FP, block FP. The flexibility provided with varying configurations, allows customization to adapt to different NN algorithms. It also provides future proofing, because the programmable array can be altered as NN algorithmic technology advances.
There is so much in this announcement, I suggest referring to the Achronix website for all the details. However, it is clear that Achronix intends to maintain its technical and business advantage in this space using a wide range of targeted technical improvements. Rather than rest on their laurels, they are using their experience to help meet the emerging computational requirements for AI/ML, which is poised to become pervasive.