Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

Consolidation and Competition: Who is Winning the $4.5 Billion Interface IP Race?
HPC can be Chiplet. Wondering why UCIe is not considered. Internally AMBA neither

— chiro.lentz on July 11, 2026
The Packaging PDK Is the Missing Layer for Co-Packaged Optics
Thank you to Daniel Nenni and SemiWiki for publishing my latest article: The Packaging PDK Is the Missing Layer for…

— moh.kolb on July 8, 2026
The Packaging PDK Is the Missing Layer for Co-Packaged Optics
Very interesting. Thanks.

— U235 on July 8, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
N+3 is denser than N6: https://newsletter.semianalysis.com/p/steel-smic-n3-teardown?open=false

— Fred Chen on July 5, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
Fixed, thank you.

— Daniel Nenni on July 4, 2026
Why Huawei Says It Will Match TSMC’s Most Advanced Chips by 2031
The article is not correct. EUV equipment is not primarily produced by ASML. It is only produced by ASML. It…

— AndyG on July 4, 2026
Intel 18A vs Intel 18A-P: What Is the Difference and Why Does It Matter?
Nice writeup

— Rahul Razdan on June 27, 2026
Available Is Not In Control: Balancing Output, Quality, and Risk in High-Volume Fabs
In a DoD centric III-V fab I had wafers run in a few decades ago, yield was miserable, but adequate…

— PBealo on June 27, 2026
Available Is Not In Control: Balancing Output, Quality, and Risk in High-Volume Fabs
Another thing that can help improve availability is a very old but often overlooked basic bedrock: Having good SPC, that…

— benb on June 24, 2026
Available Is Not In Control: Balancing Output, Quality, and Risk in High-Volume Fabs
Thanks, Ben , both points land. The single spread tool that takes the whole line down is exactly the bottleneck…

— Boris Shteinberg on June 23, 2026

WP_Term Object
(
    [term_id] => 3611
    [name] => IoT
    [slug] => iot-internet-of-things
    [term_group] => 0
    [term_taxonomy_id] => 3611
    [taxonomy] => category
    [description] => Internet of Things
    [parent] => 0
    [count] => 552
    [filter] => raw
    [cat_ID] => 3611
    [category_count] => 552
    [category_description] => Internet of Things
    [cat_name] => IoT
    [category_nicename] => iot-internet-of-things
    [category_parent] => 0
)

June 26, 2016 by Bernard Murphy

Getting Maximum Performance Bang for Your Buck through Parallelism

Getting Maximum Performance Bang for Your Buck through Parallelism
by Bernard Murphy on 06-26-2016 at 12:00 pm
Categories: General, IoT

Finding a way to optimally parallelize linear code for multi-processor platforms has been a holy grail of computer science for many years. The challenge is that we think linearly and design algorithms in the same way, but then want to speed up our analysis by adding parallelism to the algorithms we have already designed.

But the “first design, then parallelize” approach is intrinsically hard because you’re trying to impress parallel structure onto sequential code, which still leaves algorithm designers with the burden of deciding where and how best to do that. Which they do with varying degrees of success, facing risks in missing potential race conditions and bigger and further opportunities to parallelize.

This is not to say that there hasn’t been progress in at least simplifying the description task. OpenMP is an approach using a directive-based style, where pragmas are defined on top of existing code. This works in as far as it provides that simplification of describing where and how you want to parallelize. Pragmas can be used to separate functions which can be run in parallel, which is certainly simpler than hand-written threading. But you still have to be sure that underlying functions are thread-safe and intuitive understanding of the algorithm often becomes significantly harder as you move pieces around to accommodate those pragmas.

For many applications this may be good enough. But no-one would claim this comes anywhere near the aspirational goal – describe what you want in some kind of algorithm and let the compiler take care of optimizing for maximum parallelism with race-free safety. For many applications, best possible performance is a non-negotiable top-priority. Finite Element Analysis for stress, thermal, flow and EMI problems are good examples. Higher performance means more accurate results and highest possible accuracy/quality of result is the only thing that matters.

That means algorithm designers are often willing to re-write core algorithms, even in new languages, especially when code needs to be re-factored anyway. And that opens up opportunities to consider very different approaches to coding, including switching from directive-based programming (describing what calculation to perform and how to perform it) to declarative programming (describing what calculation to perform and letting the compiler figure out the best way to perform it).

One such approach, designed originally by Texas Tech in partnership with NASA, subsequently transferred through and now marketed by Texas Multicore Technologies (TMT), is based on a language called SequenceL™. The product and language don’t aim to be yet another general purpose language. This is designed for serious math and science algorithms. The compiler optimizes from SequenceL into C++ (with optional OpenCL for GPU targeting), which can coexist with algorithms for other purposes built through more pedestrian paths.

As a very simple example you can multiply 2 matrices in a single statement. Parallelism is inferred from this structure.
MatrixMultiply(A(2),B(2))[i,j]:= sum( A[i,all] * B[all,j] );

This illustrates the objective to express mathematical intent and not have it become entangled and made opaque by implementation details, which in SequenceL are left to the compiler.

How well this works is illustrated by multiple customer results. One industrial application was Emerson Process Management’s need to improve software for building network graphs for plants and oilfields. Their existing Java-based solution was estimated to require unreasonable runtimes to build a graph for 1000 nodes. Worse still, after 5 months of redesign the authors of the original code failed to improve the Java implementation sufficiently to get within target performance. Then they reached out to TMT who completed a SequenceL solution in 3 weeks, a solution which also happened to beat target performance by 10X.

Another customer was able to get a 26X speed up in a core Fortran-based computational fluid dynamics solver with 25% less code, delivering a solution that used to take 2 weeks, now completing in overnight runs. Yet another customer, very experienced in parallelizing code, commented that SequenceL was like MatLab on steroids. Pretty high praise. A lot of this is apparently due not just to automating obvious parallelism but also to being able to find and automate finer-grained opportunities to optimize that would be beyond human patience (and schedules) to explore. And throughout to do so with guaranteed safety against race conditions.

As the IoT takes off, problems like this are going to become increasingly important. It isn’t all going to be about Big Data analytics. It’s also going to be about hard science and engineering analytics. I suspect you’re going to be hearing more about TMT in the near future. You can learn more about TMT and SequenceL HERE.