You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!
As von Neumann computing architectures become increasingly constrained by data-movement overheads, researchers have started exploring in-memory computing (IMC) techniques to offset data-movement overheads.
We present a survey of 90+ papers on in-memory computing using SRAM memory. We review...
What is the difference between processing-in-memory and computing-in-memory and logic-in-memory. Or are they same? Can someone please give a definite answer, considering various memories such as SRAM/memristor/spintronic.
Sometimes, research papers confuse the terms, so I am not able to come to...
Recent years have witnessed a significant interest in the “generative adversarial networks” (GANs) due to their ability to generate high-fidelity data. GANs have a high compute and memory requirements. Also, since they involve both convolution and deconvolution operation, they do not map well...
As “deep neural networks” (DNNs) achieve increasing accuracy, they are getting employed in increasingly diverse applications, including security-critical applications such as medical and defense. The worldwide revenue produced from the deployment of AI is expected to reach $190.6 billion by...
3D convolution neural networks (CNNs) have shown excellent predictive performance on tasks such as action recognition from videos, weather forecasting, detecting action similarity between two video clips, video captioning, labeling and surveillance. Also, they are used for performing object...
Intermittent computing (ImC) refers to the scenario where periods of program execution are separated by reboots. This computing paradigm is common in some IoT devices. ImC systems are generally powered by energy-harvesting devices: they start executing a program when the accumulated energy...
RNNs have shown remarkable effectiveness in several tasks such as music generation, speech recognition and machine translation. RNN computations involve both intra-timestep and inter-timestep dependencies. Due to these features, hardware acceleration of RNNs is more challenging than that of...
Intel's Xeon Phi (having "many-integrated core" or MIC micro-architecture) combines the parallel processing power of a many-core accelerator with the programming ease of CPUs. In this paper, we survey 100+ works that study the architecture of Phi and use it as an accelerator for a broad range...
CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL) workloads in systems ranging from mobile to extreme-end servers.
We review 140+ papers focused on optimizing DL applications on CPUs. We include the methods proposed for both inference and training and...
As DNNs become common in mission-critical applications, ensuring their reliable operation has become crucial. Conventional resilience techniques fail to account for the unique characteristics of DNN algorithms/accelerators, and hence, they are infeasible or ineffective.
Our paper...
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. GPU continues to remain the most widely used accelerator for DL applications. We present a survey of architecture and system-level techniques for optimizing DL applications on GPUs. We review 75+ techniques...
Sorry, Arthur. I have not idea about the business aspect. Technically, as an academician, I can say that the effectiveness of Automata execution depends a lot on memory technology. If 3D Xpoint can provide larger fan-in/fan-out, then it will be helpful for modeling complex automata which have a...
Micron has stopped developing AP. http://naturalsemi.com and https://engineering.virginia.edu/center-automata-processing-cap are now leading the development of AP.
Problems from a wide variety of application domains can be modeled as ``nondeterministic finite automaton'' (NFA) and hence, efficient execution of NFAs can improve the performance of several key applications. Since traditional architectures, such as CPU and GPU are not inherently suited for...
Intel’s Xeon Phi combines the parallel processing power of a many-core accelerator with the programming ease of CPUs. We survey ~100 works that study the architecture of Phi and use it as an accelerator for a broad range of applications. We discuss the strengths and limitations of Phi. We...
Design of hardware accelerators for neural network (NN) applications involves walking a tight rope amidst the constraints of low-power, high accuracy and throughput. NVIDIA's Jetson is a promising platform for embedded machine learning which seeks to achieve a balance between the above...
Mobile web browsing (MWB) can very well be termed as the confluence of two major revolutions: mobile (smartphone) and internet revolution. Mobile web traffic has now surpassed the desktop web traffic and has become the primary means for service providers to reach-out to the billions of...
Spintronic memories such as STT-RAM (spin transfer torque RAM), SOT-RAM (spin orbit torque RAM) and DWM (domain wall memory) facilitate efficient implementation of PIM (processing-in-memory) approach and NN (neural network) accelerators and offer several advantages over conventional memories...
Data-movement consumes two orders of magnitude higher energy than a floating-point operation and hence, data-movement is becoming the primary bottleneck in scaling the performance of modern processors within the fixed power budget. The accelerators for deep neural networks have huge memory...
CNNs (convolutional neural networks) have been recently successfully applied for a wide range of cognitive challenges. Given high computational demands of CNNs, custom hardware accelerators are vital for boosting their performance. The high energy-efficiency, computing capabilities and...