[content] => 
    [params] => Array
            [0] => /forum/index.php?threads/survey-paper-on-deep-learning-on-gpus.11681/

    [addOns] => Array
            [DL6/MLTP] => 13
            [Hampel/JobRunner] => 1010070
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [ThemeHouse/XLink] => 1000670
            [ThemeHouse/XPress] => 1010392
            [XF] => 2010571
            [XFI] => 1020470
            [vw/vw] => 1214010024

    [wordpress] => /var/www/html

Survey paper on Deep Learning on GPUs


New member
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. GPU continues to remain the most widely used accelerator for DL applications. We present a survey of architecture and system-level techniques for optimizing DL applications on GPUs. We review 75+ techniques focused on both inference and training and for both single GPU and distributed system with multiple GPUs. It covers techniques for pruning, tiling, batching, impact of data-layouts, data-reuse schemes and convolution strategies (FFT/direct/GEMM/Winograd), etc. It also covers techniques for offloading data to CPU memory for avoiding GPU-memory bottlenecks during training.

The paper is available here, accepted in J. of Systems Architecture 2019.