Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
You’re welcome — and thanks for the kind words. Glad you enjoyed the article.

— Jonah McLeod on April 4, 2026
Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
Thanks for writing.. .enjoyed your article

— Rahul Razdan on April 4, 2026
Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
@kingmouf - Functionally, the fabrics are very similar (6-input LUTS, DSPs, BRAM, interconnect, etc.). DSPs are slightly different and both…

— ajaros925 on March 31, 2026
Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
How does the eFPGA fabric mentioned here compares to AMD(Xilinx)/Altera fabrics? How do you address potential security issues?

— kingmouf on March 31, 2026
Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
Interesting article. eFPGA is clearly valuable as silicon insurance, but it still buys that flexibility with meaningful area, power, and…

— TomJackson on March 30, 2026
Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
Your point that radiation accelerates device aging is a real constraint. But it’s also a predictable one. Space hardware is…

— Jonah McLeod on March 29, 2026
Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
He's fixated on the heating thing because it's the only theoretically viable aspect of his new scam. After considering what…

— coldsolder215 on March 29, 2026
Chemical Origins of Environmental Modifications to MOR Lithographic Chemistry
This is an important finding for understanding how MORs work, but it clearly puts oxygen in the role that acids…

— Fred Chen on March 26, 2026
Captain America: Can Elon Musk Save America’s Chip Manufacturing Industry?
No, Elon won’t turn into LBT but he doesn’t need to. All he needs is to create an culture where…

— Jonah McLeod on March 25, 2026
Captain America: Can Elon Musk Save America’s Chip Manufacturing Industry?
That is the first time I hear "egos in check" and "Elon" in the same sentence. Not going to happen,…

— jmlobert on March 25, 2026

RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

WP_Term Object
(
    [term_id] => 50
    [name] => Events
    [slug] => events
    [term_group] => 0
    [term_taxonomy_id] => 50
    [taxonomy] => category
    [description] => 
    [parent] => 0
    [count] => 1522
    [filter] => raw
    [cat_ID] => 50
    [category_count] => 1522
    [category_description] => 
    [cat_name] => Events
    [category_nicename] => events
    [category_parent] => 0
)

November 5, 2012June 14, 2019 by Paul McLellan

Gustafson on Parallel Algorithms

Name: 2026 IEEE Custom Integrated Circuits Conference (CICC)
Start: 2026-04-19T00:00:00-07:00
End: 2026-04-23T23:59:59-07:00
Location: Seattle, Washington

Gustafson on Parallel Algorithms
by Paul McLellan on 11-05-2012 at 4:54 pm
Categories: EDA, Events

At the keynote for ICCAD this morning, John Gustafson of AMD (where he is Chief Graphics Product Architect as well as a Fellow) talked about parallel algorithms. Like Gene Amdahl, whose law states that parallel algorithms are limited by the part that cannot be parallelized (if 10% is serial, then even if the other part takes place in zero time, the maximum speedup is 10X), Gustafson has a law named after him. It basically says Amdahl is wrong, that there is no limit to the speedup you can get as long as you increase the size of the problem along with the number of cores. So his talk was a look at whether there are embarrassingly serial problems, problems that are not open to being parallelized.

For example, at first glance, calculating the Fibonacci series look like one. Each term depends on the previous two so how can you bring a millions servers to bear on the problem. But, as anyone who has done any advanced math knows, there is a formula (curiously involving the golden ratio) so it is straightforward to calculate as many terms as desired in parallel.

By 2018 we should have million server systems each doing teraflops through highly parallel operations running on GPUs. The big challenge is the memory wall. For operations that involve a high ratio of work to decision making, this sort of SIMD (single instruction, multiple data) can significantly reduce wattage per teraflop.

Throwaway line of the day: with great power comes great responsibility…and some really big heatsinks!

An instruction issue consumes around 30 times more power than basic mutiply-add operations and a memory access much more power than that. Memory transfers will soon be half the power consumed and processors are already power-constrained. Part of the problem is that hardware caches are very wasteful, designed to make programming easy rather than keep power down. They minimize miss-rates at the cost of low utilization (around 20%). Even more surprisingly, only 20% of the data written back out of the cache is ever accessed again so didn’t really need to be written back at all. John felt that at least for low-level programmers we need a programming environment that makes memory placement visible and explicit (as it apparently was on the Cray-2).

There are two ways to associate a SIMD GPU with a processor: on-chip and a separate off-chip device. On chip seems to work best for problems where data re-use is 10-100 (such as FFT and sparse-matrix operations) and an off-chip device works best for data re-use in the 1000s, such as dense matrix and many body dynamics.

We also need better arithmetic. Most programmers have never studied numerical analysis and so have no idea how many bits of precision there are or how to calculate it. A specific problem is that accumulating results (by adding) needs much more precision that is used to calculate the numbers to add. Eventually you are adding small numbers to a number that is so large that it doesn’t change. John had a few examples where he was using 8 bit floating point (yes, really. 1 sign bit, 3 bits of exponent and 4 bits of mantissa) but doing accurate analysis.

John’s final conclusion: if we really cherish every bit moved to and from main RAM then we can get better arithmetic answers (provable bounds) and as a side-effect help the memory wall dilemma and always have a use for massive parallelism.

Share this post via:

Comments

0 Replies to “Gustafson on Parallel Algorithms”

You must register or log in to view/post comments.

Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
You’re welcome — and thanks for the kind words. Glad you enjoyed the article.

— Jonah McLeod on April 4, 2026
Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
Thanks for writing.. .enjoyed your article

— Rahul Razdan on April 4, 2026
Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
@kingmouf - Functionally, the fabrics are very similar (6-input LUTS, DSPs, BRAM, interconnect, etc.). DSPs are slightly different and both…

— ajaros925 on March 31, 2026
Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
How does the eFPGA fabric mentioned here compares to AMD(Xilinx)/Altera fabrics? How do you address potential security issues?

— kingmouf on March 31, 2026
Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
Interesting article. eFPGA is clearly valuable as silicon insurance, but it still buys that flexibility with meaningful area, power, and…

— TomJackson on March 30, 2026
Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
Your point that radiation accelerates device aging is a real constraint. But it’s also a predictable one. Space hardware is…

— Jonah McLeod on March 29, 2026
Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
He's fixated on the heating thing because it's the only theoretically viable aspect of his new scam. After considering what…

— coldsolder215 on March 29, 2026
Chemical Origins of Environmental Modifications to MOR Lithographic Chemistry
This is an important finding for understanding how MORs work, but it clearly puts oxygen in the role that acids…

— Fred Chen on March 26, 2026

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Comments

0 Replies to “Gustafson on Parallel Algorithms”

Recent Forum Threads

Recent Article Comments