Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

TSMC N3 Process Technology Wiki
Hmm - what's the source for 0.015-0.016? -- this thread shows 0.0199 (N3B) and 0.021 (N3E) https://semiwiki.com/forum/threads/tsmc-officially-halts-sram-scaling.17223/ Perhaps this source…

— Xebec on July 14, 2025
Moore’s Law Wiki
Are these AI Generated? :)

— Xebec on July 14, 2025
TSMC N3 Process Technology Wiki
It should be 25-30% smaller? Process Node Typical SRAM Cell Size Density Improvement TSMC N5 ~0.021 µm² — TSMC N3…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
~1.6x denser vs. N5 SRAM I thought the scaling was more like 1.05X? (Various threads here on 'SRAM scaling dead…

— Xebec on July 14, 2025
Facing the Quantum Nature of EUV Lithography
This presentation considers 5 nm Gaussian acid blur: https://www.youtube.com/watch?v=MYLdE69RDBg

— Fred Chen on July 7, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Appreciate your take, Rahul. You’re absolutely right that market scale drives architectural investment—scalar dominated when desktop and enterprise ruled, and…

— Jonah McLeod on June 29, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Well.. I found this to be a funny article. Flynn's critique is fine and good...but not really the driving factor…

— Rahul Razdan on June 29, 2025
Reachability in Analog and AMS. Innovation in Verification
Apologies for that slip-up on our part. Failing memories!

— Bernard Murphy on June 27, 2025
Reachability in Analog and AMS. Innovation in Verification
swka: This is true, I worked with MunEDA up until the Cadence acquisition. Before that I worked with Solido up…

— Daniel Nenni on June 26, 2025
Reachability in Analog and AMS. Innovation in Verification
One quick correction. WiCkeD was MunEDA tool, which was acquired by Cadence. So it is never part of Synopsys. Synopsy…

— swka on June 26, 2025

WP_Term Object
(
    [term_id] => 50
    [name] => Events
    [slug] => events
    [term_group] => 0
    [term_taxonomy_id] => 50
    [taxonomy] => category
    [description] => 
    [parent] => 0
    [count] => 1380
    [filter] => raw
    [cat_ID] => 50
    [category_count] => 1380
    [category_description] => 
    [cat_name] => Events
    [category_nicename] => events
    [category_parent] => 0
)

November 5, 2012June 14, 2019 by Paul McLellan

Gustafson on Parallel Algorithms

Name: Semitracks Course: Silicon Photonics Technology and Applications
Start: 2025-07-16T00:00:00-07:00
End: 2025-07-17T23:59:59-07:00
Location: San Jose, CA

Gustafson on Parallel Algorithms
by Paul McLellan on 11-05-2012 at 4:54 pm
Categories: EDA, Events

At the keynote for ICCAD this morning, John Gustafson of AMD (where he is Chief Graphics Product Architect as well as a Fellow) talked about parallel algorithms. Like Gene Amdahl, whose law states that parallel algorithms are limited by the part that cannot be parallelized (if 10% is serial, then even if the other part takes place in zero time, the maximum speedup is 10X), Gustafson has a law named after him. It basically says Amdahl is wrong, that there is no limit to the speedup you can get as long as you increase the size of the problem along with the number of cores. So his talk was a look at whether there are embarrassingly serial problems, problems that are not open to being parallelized.

For example, at first glance, calculating the Fibonacci series look like one. Each term depends on the previous two so how can you bring a millions servers to bear on the problem. But, as anyone who has done any advanced math knows, there is a formula (curiously involving the golden ratio) so it is straightforward to calculate as many terms as desired in parallel.

By 2018 we should have million server systems each doing teraflops through highly parallel operations running on GPUs. The big challenge is the memory wall. For operations that involve a high ratio of work to decision making, this sort of SIMD (single instruction, multiple data) can significantly reduce wattage per teraflop.

Throwaway line of the day: with great power comes great responsibility…and some really big heatsinks!

An instruction issue consumes around 30 times more power than basic mutiply-add operations and a memory access much more power than that. Memory transfers will soon be half the power consumed and processors are already power-constrained. Part of the problem is that hardware caches are very wasteful, designed to make programming easy rather than keep power down. They minimize miss-rates at the cost of low utilization (around 20%). Even more surprisingly, only 20% of the data written back out of the cache is ever accessed again so didn’t really need to be written back at all. John felt that at least for low-level programmers we need a programming environment that makes memory placement visible and explicit (as it apparently was on the Cray-2).

There are two ways to associate a SIMD GPU with a processor: on-chip and a separate off-chip device. On chip seems to work best for problems where data re-use is 10-100 (such as FFT and sparse-matrix operations) and an off-chip device works best for data re-use in the 1000s, such as dense matrix and many body dynamics.

We also need better arithmetic. Most programmers have never studied numerical analysis and so have no idea how many bits of precision there are or how to calculate it. A specific problem is that accumulating results (by adding) needs much more precision that is used to calculate the numbers to add. Eventually you are adding small numbers to a number that is so large that it doesn’t change. John had a few examples where he was using 8 bit floating point (yes, really. 1 sign bit, 3 bits of exponent and 4 bits of mantissa) but doing accurate analysis.

John’s final conclusion: if we really cherish every bit moved to and from main RAM then we can get better arithmetic answers (provable bounds) and as a side-effect help the memory wall dilemma and always have a use for massive parallelism.

Share this post via:

Comments

0 Replies to “Gustafson on Parallel Algorithms”

You must register or log in to view/post comments.

TSMC N3 Process Technology Wiki
Hmm - what's the source for 0.015-0.016? -- this thread shows 0.0199 (N3B) and 0.021 (N3E) https://semiwiki.com/forum/threads/tsmc-officially-halts-sram-scaling.17223/ Perhaps this source…

— Xebec on July 14, 2025
Moore’s Law Wiki
Are these AI Generated? :)

— Xebec on July 14, 2025
TSMC N3 Process Technology Wiki
It should be 25-30% smaller? Process Node Typical SRAM Cell Size Density Improvement TSMC N5 ~0.021 µm² — TSMC N3…

— Daniel Nenni on July 14, 2025
TSMC N3 Process Technology Wiki
~1.6x denser vs. N5 SRAM I thought the scaling was more like 1.05X? (Various threads here on 'SRAM scaling dead…

— Xebec on July 14, 2025
Facing the Quantum Nature of EUV Lithography
This presentation considers 5 nm Gaussian acid blur: https://www.youtube.com/watch?v=MYLdE69RDBg

— Fred Chen on July 7, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Appreciate your take, Rahul. You’re absolutely right that market scale drives architectural investment—scalar dominated when desktop and enterprise ruled, and…

— Jonah McLeod on June 29, 2025
Flynn Was Right: How a 2003 Warning Foretold Today’s Architectural Pivot
Well.. I found this to be a funny article. Flynn's critique is fine and good...but not really the driving factor…

— Rahul Razdan on June 29, 2025
Reachability in Analog and AMS. Innovation in Verification
Apologies for that slip-up on our part. Failing memories!

— Bernard Murphy on June 27, 2025

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Comments

0 Replies to “Gustafson on Parallel Algorithms”

Recent Forum Threads

Recent Article Comments