Colin Alexander ( Director of product marketing at Achronix) released a webinar recently on this topic. At only 20 minutes the webinar is an easy watch and a useful update on data traffic and implementation options. Downloads are still dominated by video (over 50% for Facebook) which now depends heavily on caching at or close to the edge. Which of these applies depends on your definition of “edge”. The IoT world see themselves as the edge, the cloud and infrastructure world apparently see the last compute node in the infrastructure, before those leaf devices, as the edge. Potato, potahto. In any event the infrastructure view of the edge is where you will find video caching, to serve the most popular downloads as efficiently and as quickly as possible.
Compute options at the edge (and in the cloud)
Colin talks initially about infrastructure edge where some horsepower is required in compute and in AI. He presents the standard options: CPU, GPU, ASIC or FPGA. A CPU-based solution has the greatest flexibility because your solution will be entirely software based. For the same reason, it will also generally be the slowest, most power hungry and longest latency option (for round trip to leaf nodes I assume). GPUs are somewhat better on performance and power with a bit less flexibility than CPUs. An ASIC (custom hardware) will be fastest, lowest power and lowest latency, though in concept least flexible (all the smarts are in hardware which can’t be changed).
He presents FPGA (or embedded FPGA/eFPGA) as a good compromise between these extremes. Better on performance, power and latency than CPU or GPU and somewhere between a CPU and a GPU on flexibility. While much better than an ASIC on flexibility because an FPGA can be reprogrammed. Which all makes sense to me as far as it goes, though I think the story should have been completed by adding DSPs to the platform line up. These can have AI-specific hardware advantages (vectorization, MAC arrays, etc) which benefit performance, power, and latency. While retaining software flexibility. The other important consideration is cost. This is always a sensitive topic of course but AI capable CPUs, GPUs and FPGA devices can be pricey, a concern for the bill of materials of an edge node.
Colin’s argument makes most sense to me at the edge for eFPGA embedded in a larger SoC. In a cloud application, constraints are different. A smart network interface card is probably not as price sensitive and there may be a performance advantage in an FPGA-based solution versus a software-based solution.
Supporting AI applications at the compute edge through an eFPGA looks like an option worth investigating further. Further out towards leaf nodes is fuzzy for me. A logistics tracker or a soil moisture sensor for sure won’t host significant compute, but what about a voice activated TV remote? Or a smart microwave? Both need AI but neither need a lot of horsepower. The microwave has wired power, but a TV remote or remote smart speaker runs on batteries. It would be interesting to know the eFPGA tradeoffs here.
eFPGA capabilities for AI
Per the datasheet, Speedster 7t offers fully fracturable integer MACs, flexible floating point, native support for bfloat and efficient matrix multiplications. I couldn’t find any data on TOPS or TOPS/Watt. I’m sure that depends on implementation but examples would be useful. Even at the edge, some applications are very performance sensitive – smart surveillance and forward-facing object detection in cars for example. It would be interesting to know where eFPGA might fit in such applications.
Thought-provoking webinar. You can watch it HERE.