Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

Consolidation and Competition: Who is Winning the $4.5 Billion Interface IP Race?
HPC can be Chiplet. Wondering why UCIe is not considered. Internally AMBA neither

— chiro.lentz on July 11, 2026
The Packaging PDK Is the Missing Layer for Co-Packaged Optics
Thank you to Daniel Nenni and SemiWiki for publishing my latest article: The Packaging PDK Is the Missing Layer for…

— moh.kolb on July 8, 2026
The Packaging PDK Is the Missing Layer for Co-Packaged Optics
Very interesting. Thanks.

— U235 on July 8, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
N+3 is denser than N6: https://newsletter.semianalysis.com/p/steel-smic-n3-teardown?open=false

— Fred Chen on July 5, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
Fixed, thank you.

— Daniel Nenni on July 4, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
The article is not correct. EUV equipment is not primarily produced by ASML. It is only produced by ASML. It…

— AndyG on July 4, 2026
Intel 18A vs Intel 18A-P: What Is the Difference and Why Does It Matter?
Nice writeup

— Rahul Razdan on June 27, 2026
Available Is Not In Control: Balancing Output, Quality, and Risk in High-Volume Fabs
In a DoD centric III-V fab I had wafers run in a few decades ago, yield was miserable, but adequate…

— PBealo on June 27, 2026
Available Is Not In Control: Balancing Output, Quality, and Risk in High-Volume Fabs
Another thing that can help improve availability is a very old but often overlooked basic bedrock: Having good SPC, that…

— benb on June 24, 2026
Available Is Not In Control: Balancing Output, Quality, and Risk in High-Volume Fabs
Thanks, Ben , both points land. The single spread tool that takes the whole line down is exactly the bottleneck…

— Boris Shteinberg on June 23, 2026

WP_Term Object
(
    [term_id] => 6435
    [name] => AI
    [slug] => artificial-intelligence
    [term_group] => 0
    [term_taxonomy_id] => 6435
    [taxonomy] => category
    [description] => Artificial Intelligence
    [parent] => 0
    [count] => 847
    [filter] => raw
    [cat_ID] => 6435
    [category_count] => 847
    [category_description] => Artificial Intelligence
    [cat_name] => AI
    [category_nicename] => artificial-intelligence
    [category_parent] => 0
)

January 1, 2018 by Patrick Moorhead

IBM Plays With The AI Giants With New, Scalable And Distributed Deep Learning Software

IBM Plays With The AI Giants With New, Scalable And Distributed Deep Learning Software
by Patrick Moorhead on 01-01-2018 at 11:00 am
Categories: AI
3 Comments

I’ve been following IBM’s AI efforts with interest for a quite a while now. In my opinion, the company jump-started the current cycle of AI with the introduction of Watson back in the 2000s and has steadily been ramping up its efforts since then. Most recently, I wrote about the launch of PowerAI, IBM’s software toolkit solution to use with OpenPOWER systems for enterprises who don’t want to develop their AI solutions entirely from scratch but still want to be able to customize to fit their specific deep learning needs. Today, IBM Research announced a new breakthrough that will only serve to further enhance PowerAI and its other AI offerings—a groundbreaking Distributed Deep Learning (DDL) software, which is one of the biggest announcements I’ve tracked in this space for the past six months.

Getting rid of the single-node bottleneck

Anyone who has been paying attention knows that deep learning has really taken off in the last several years. It’s powering hundreds of applications, in consumer as well as business realms, and continues to grow. One of the biggest problems holding back the further proliferation of deep learning, however, is the issue of scalability. Most AI servers today are just one single system, not multiple systems combined. The most popular open-source deep learning software frameworks simply don’t perform well across multiple servers, creating a time-consuming bottleneck. In other words, while many data scientists have access to servers with four to eight GPUs, they can’t take advantage of it and scale beyond the single node—at the end of the day, the software just wasn’t designed for it.

Enter the IBM DDL library: a library built with IBM Research’s unique clustering methods, that links into leading open-source AI frameworks (such as TensorFlow, Caffee, Torch, and Chainer). With DDL, these frameworks can be scaled to tens of IBM servers, taking advantage of hundreds of GPUs—a night and day difference from the old model of doing things. To paint a picture, when IBM initially tried to train a model with the ImageNet-22K data set, using a ResNet-101 model, it took 16 days on a single Power “Minsky” server, using four NVIDIA P100 GPU accelerators. A 16-day training run means a significant delay of time to insight, and can seriously hinder productivity.

IBM is calling DDL “the jet engine of deep learning”—a catchy moniker that honestly isn’t too far off the mark in my opinion. Using DDL techniques, IBM says it was able to cut down that same process to a mere 7 hours, on 64 Power “Minsky” servers, with a total of 256 NVIDIA P100 GPU accelerators. Let me reiterate that: 16 days, down to 7 hours. If these results are accurate, which I think they are, it’s clear why IBM thinks it has a real game-changer on its hands. IBM’s new image recognition record of 33.8% accuracy in 7 hours handily surpasses the previous industry record set by Microsoft—29.9% accuracy in 10 days. To top it all off, IBM says DDL scales efficiently—across up to 256 GPUs, with up to 95% efficiency on the Caffe deep learning framework.

Now available in beta

Developers won’t have to wait to try out this new technology. IBM research is delivering a beta version of the DDL to IBM Systems, which is available now in the newly announced 4th revision of IBM’s PowerAI (for TensorFlow and Caffe, with Torch and Chainer to follow soon). I think this will be a great addition to IBM’s Power systems, which I’ve called the “Swiss Army knives of acceleration”—standard PCI express, CAPI, and NVLink, all wrapped up in one platform.

Another unique thing of note about DDL is that it will be available not only on-prem but also through the cloud—via a cloud provider called Nimbix. In today’s hybrid environment, this flexibility is obviously a plus. Developers can try it out beta version now on Nimbix, or on an IBM Power Systems server.
Wrapping up

One of the most interesting things for me is that this new technology is coming from IBM, not one of the flashier, louder AI proponents like Google or Facebook. It looks like if IBM can continue to bring “firsts” to the table, IBM is really shaping up to be not just a major player in the enterprise, but for deep learning overall. DDL and OpenPOWER are the secret sauce that I think will give IBM an edge it needs—significantly cutting down training times, and improving accuracy and efficiency. I’ll continue to watch with interest, but I think by getting rid of this bottleneck, DDL has the potential to really open the deep learning floodgates. It could be a real game-changer for IBM, PowerAI, and OpenPOWER.