You are currently viewing SemiWiki as a guest which gives you limited access to the site. To view blog comments and experience other SemiWiki features you must be a registered member. Registration is fast, simple, and absolutely free so please, join our community today!

Survey paper on Deep Learning on GPUs


New member
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. GPU continues to remain the most widely used accelerator for DL applications. We present a survey of architecture and system-level techniques for optimizing DL applications on GPUs. We review 75+ techniques focused on both inference and training and for both single GPU and distributed system with multiple GPUs. It covers techniques for pruning, tiling, batching, impact of data-layouts, data-reuse schemes and convolution strategies (FFT/direct/GEMM/Winograd), etc. It also covers techniques for offloading data to CPU memory for avoiding GPU-memory bottlenecks during training.

The paper is available here, accepted in J. of Systems Architecture 2019.