While many tough problems relating to computing have been solved over the years, vision processing is still challenging in many ways. Cheng Wang, Co-Founder and CTO of FlexLogix Technologies gave a talk on the topic of edge vision processing at Linley’s Spring 2022 conference. During that talk he references how Gerald Sussman took the early steps of computer vision processing way back in 1966. Gerald, a first-year undergraduate student under the guidance of MIT AI Lab co-founder Marvin Minsky tried to link a camera to a computer. Much progress has happened since then. Of course, the requirements and the markets for computer vision haven’t stayed static during this time.
The early era of computer vision processing focused on industrial grade computing equipment that tolerated large form factors and high costs of the solutions. Fast forward to the most recent decade, neural network models and GPUs have played critical roles in advancing vision processing capabilities. But delivering solutions in smaller form factors and at low costs is still a challenge. In his talk, Cheng discusses the reasons behind these challenges and FlexLogix’s solution to edge vision processing based on dynamically reconfigurable TPU technology. The following are some excerpts from his presentation.
Performance, Efficiency and Flexibility
Edge computer vision requires extreme amount of processing at Teraops rates. And the vision solutions need to demonstrate high accuracy at low latencies, operate at low power and be available at low cost points. While GPUs can deliver the performance, they are large, expensive and power hungry and thus not a good match for edge compute devices. And GPUs count on a huge amount of memory bandwidth via DDR type interfaces. On top of these challenges, the neural models are also fast evolving. Not only are new models emerging at a rapid rate, even the same models undergo incremental changes at a frequent rate. Refer to Figure below to see how frequently the popular model YOLOv5 is going through changes.
The processing of neural network models is very different from general purpose processing when it comes to compute work load and memory access patterns. Each layer may require vary computational loads relative to the memory bandwidth that layer requires. And this changes dynamically as different layers are processed. So, an optimal approach to solving the challenges counts on memory efficiency and future proofing for changing models. Graph streaming will help reduce DRAM requirements but bandwidth matching on a varying load is difficult.
FlexLogix’s Dynamic TPU
FlexLogix’s Dynamic TPU offers a flexible, load-balanced, memory-efficient solution for edge vision processing applications.
The Dynamic TPU is implemented using Tensor Processor Arrays (ALUs) and EFLX logic. The architecture enables very efficient layer processing across multiple Tensor Processor Arrays that communicate via FlexLogix’s XFLX InterConnect and access L2 SRAM for memory efficiency. As the TPU uses EFLX cores, the control and data paths are future proofed for changes in activation functions and operator changes. By streaming data at a sub-graph level, more efficient bandwidth matching is made possible. Refer to Figure below.
While a GPU-based edge vision processing solution may consume power in the 75W-300W range , a Dynamic TPU based solution will consume in the 6W-10W range. Whereas a GPU-based solution predominantly relies on GDDR, a Dynamic TPU-based solution relies on local connections, XFLX connections, flexible L2 memories and LPDDR.
The FlexLogix solution includes the InferX SDK which directly converts a TensorFlow graph model to dynamic InferX hardware instance. A Dynamic TPU-based solution will yield a much higher efficiency on the Inference/Watt and Inference/$ metrics compared to a GPU or CPU based solution. All in all, a superior performance with software flexibility and future proofing versus ASIC solutions.
On-Demand Access to Cheng’s talk and presentation
Share this post via: