Despite their recent rise to prominence, the fundamentals of AI, specifically neural networks and deep learning, were established as far back as the late 50’s and early 60’s. The first neural network, the Perceptron, had a single layer and was good certain types of recognition. However, the Perceptron was unable to learn how to handle XOR operations. What eventually followed were multi-layer neural networks that performed much better at recognition tasks, but required more effort to train. Until the early 2000’s the field was held back by limitations that can be tied back to insufficient computing resources and training data.
All this changed as chip speeds increased and the internet provided a rich set of images for use in training. ImageNet was one of the first really significant sources of labeled images, the type needed to perform higher quality training. Nevertheless, the theoretical underpinnings were established decades ago. Multilayer networks proved much more effective at recognition tasks, and with them came additional processing requirements. So today we have so called deep learning which boasts many layers of processing.
While neural networks provide a general-purpose method of solving problems that does not require formal coding, there are still many architectural choices that are needed to provide an optimal network for a given class of problems. Neural networks have relied on general purpose CPU’s, GPU’s or custom ASICs. CPU’s have the advantage of flexibility, but this comes at the cost of lower throughput. Loading and storing of operands and results creates significant overhead. Likewise, GPU’s are often optimized to use local memory and perform floating point operations, which together do not always best serve deep learning requirements.
The ideal neural network is a systolic network where data is moved directly from processing element to processing element. Also, deep learning has become very efficient with low precision integer operations. So, it seems that perhaps ASIC’s might be the better vehicle. However, as architectures of neural networks themselves evolve, ASIC might prematurely lock in an architecture and prevent optimization based on real world experience.
It turns out that FPGA’s are a nice fit for this problem. In a recent white paper by Achronix, they point out the advantages that FPGA’s bring to deep learning. The white paper, entitled “The Ideal Solution for AI Applications — Speedcore eFPGAs”, goes further to suggest that embedded FPGA is even more aptly suited to this class of problems. The paper starts out with an easily readable introduction to the history and underpinnings of deep learning, then moves on the specifics of how processing power has created the revolution we are now witnessing.
Yet, conventional FPGA devices introduce their own problems. In many cases they are not optimally configured for specific applications. Designers must accept the resource allocation available in commercially available parts. There is also the perennial problem of off chip communication. Conventional FPGA’s require moving the data through IO’s onto board traces and then back onto the other chip. The round trip can be prohibitively expensive from a power and performance perspective.
Achronix now offers embeddable FPGA fabric, which they call eFPGA. Because it is completely configurable, only the necessary LUT’s, memories, DSP, interfaces, etc. need to be included. And, of course, the communication with other elements of the system are through direct bus interconnection or an on-chip NoC. This reduces silicon that is needed for IO’s on both ends.
The techniques and architectures used for neural networks are rapidly evolving. Design approaches that provide maximum flexibility require experimentation and evolution. Having the ability to modify the architecture can be crucial. Embedded FPGA’s definitely have a role to play in this rapidly growing and evolving segment. The Achronix white paper is available on their web site for engineers who want to look deeper into this approach.
Read more about Achronix on SemiWiki.comShare this post via: