The chase to add artificial intelligence (AI) into many complex applications is surfacing a new trend. There’s a sense these applications need a lot of AI inference operations, but very few architects can say precisely what those operations will do. Self-driving may be the best example, where improved AI model research and discovery are on a frantic pace. What should a compute environment look like when AI inference models are uncertain?
Software adds AI inference flexibility but at a cost
A familiar reflex in the face of uncertainty is opting for software programmability. This dynamic has dominated large-scale CPU core development for generations. Faster processors debut, programmers write more software until it takes all the new-found capacity, then another round of even faster processors appears. But there’s a mismatch between a bulked-up CPU core and the fine-grained parallelized workloads in AI inference, and inefficiency becomes overwhelming.
Then GPUs showed up at the AI inference party with many smaller, parallelized cores and multithreading. On the surface, scaling up a software-programmable field of fast GPU cores seems a better fit for the fine-grained inference workload. If one has room for a rack of GPU-based hardware, it’s possible to pack a lot of TOPS in a system. But bigger GPUs start presenting other issues for AI inference in the form of sub-optimal interconnects and memory access. Hardware utilization isn’t great, and determinism and latency are suspect. Power consumption and cooling also head in the wrong direction.
Hardware could optimize around a known workload
If this is starting to sound like the case for a workload optimized custom SoC, that’s because it is. Design high-performance execution units, optimize memory access and interconnects, and organize them around running an AI inference model.
We’re seeing off-the-shelf AI inference SoCs popping up all over – primarily targeting one specific class of AI inference problem. There are SoCs designed to run YOLO models on facial recognition. Others optimize for driver assistance functions like lane adherence or emergency braking. AI inference is starting to get traction in areas like pharmaceutical research. If the AI inference models are well-defined, optimizing the workload in hardware is achievable.
But different AI inference models do not map onto layers or execution units the same way. Optimizing hardware around one model can be utterly inefficient for running another model. Making optimization matters worse, some of these more complex problems call for running different types of AI inference models concurrently on separate parts of the problem.
Niching down a custom SoC too tightly can result in lock-in, possibly preventing an enhanced AI inference model from running efficiently without a hardware redesign. That’s terrible news for a long life cycle project where the breakthrough AI inference innovations are yet to happen. It’s also not healthy for return on investment if volumes on a custom SoC are too low.
If only there were fast, programmable AI inference hardware
Several IP vendors are working on the specifics of reconfigurable AI inference engines with higher utilization and efficiency. Most are after the premise of co-design, where one looks at the AI inference models at hand and then decides how to configure the engine to run them best.
Recapping, we don’t know what the best hardware solution looks like when we start the project. We need a platform to explore combinations of IP quickly and change the design accordingly, maybe many times during development. And we must respond promptly and keep pace with state-of-the-art AI inference methods from new research. Also, if we’re going to a custom SoC, we need an inexpensive platform for software development before silicon is available.
Before thinking about designing a workload-optimized SoC, or even thinking about one at all if volumes are low, we should be thinking about an FPGA-based solution. The fact that an application may depend on AI inference models currently in flux reinforces that choice.
On that backdrop comes the Achronix VectorPath Accelerator Card, jointly designed with BittWare and now in general availability. It carries the Achronix Speedster 7t1500 FPGA, with its unique multi-fracturable MAC array matched to high performance LRAM and BRAM. Much of the attention on this design focuses on its blazing Ethernet connectivity for applications like high-frequency trading. It’s also an 86 TOPS engine with a 2-dimensional NoC for optimizing IP interconnects, plus 4 Tbps bandwidth to GDDR6 memory. Sensor data can come via those Ethernet ports or MCIO lanes at PCIe Gen5 data rates or on legacy interfaces over GPIO.
In short, it’s a powerful platform for AI inference, whether starting with third-party IP or designing it in-house. It drops into a host system easily with its PCIe form factor. More importantly, it allows designers to cope with projects starting when AI inference models are uncertain. We expect AI inference software and IP vendors to adopt Achronix into their ecosystems soon, and we’ll watch for future developments.
Visit the Achronix VectorPath Accelerator Card page for videos, datasheets, white papers, and more information on how this can help your AI inference project.Share this post via: