Array
(
    [content] => 
    [params] => Array
        (
            [0] => /forum/index.php?threads/is-this-software-advance-significant.19688/
        )

    [addOns] => Array
        (
            [DL6/MLTP] => 13
            [Hampel/TimeZoneDebug] => 1000070
            [SV/ChangePostDate] => 2010200
            [SemiWiki/Newsletter] => 1000010
            [SemiWiki/WPMenu] => 1000010
            [SemiWiki/XPressExtend] => 1000010
            [ThemeHouse/XLink] => 1000970
            [ThemeHouse/XPress] => 1010570
            [XF] => 2021370
            [XFI] => 1050270
        )

    [wordpress] => /var/www/html
)

Is this software advance significant?

Arthur Hanson

Well-known member

Parallel processing has been done for a long time, is this truly an advance or just a new approach with older technologies? If so what do readers think the impact will be and what companies stand the most to benefit?
 
This is the research paper referred to by the newatlas article:


I'll start with an editorial comment: I despise papers written in this style. It is difficult to read, incredibly dense, and requires a high level of expertise to be able to judge the efficacy of the concepts, which means this paper isn't easy to use as a learning aid. I suppose the internet and search engines make it possible to get substantial value from a paper like this, but it is tiresome. The paper has 111 cited references. Seriously? Also, there are several grammatical errors in the paper, which leads one to wonder how carefully it was peer-reviewed. End of rant.

That said, heterogeneous computing has been an active area for research and implementation for a long time. Two examples are OpenCL and SYCL. All of these approaches use the so-called "magic compiler" strategy, to produce code objects which produce (supposedly) equivalent computational results with different instruction set architectures in the underlying hardware, and that includes RTL for FPGA execution. This paper does not include consideration of FPGAs, and only considers processors which have formal instruction sets, though one could define instruction execution macros in FPGAs which would have similar functionality, and I suppose could be accommodated in the magic compiler.

The difference between the previous strategies, such as the SYCL/OpenCL approaches, and the paper's version of SHMT, is that the paper describes an instruction processing strategy which is reminiscent of the internals of many modern CPUs, where instructions are decoded in parallel into micro-operations (micro-ops), which are then scheduled and executed by multiple microcode engines in parallel pipelines, and the results are assembled into the CPU's external instructions. I like the innovation, but it appears to me that the SHMT run-time layer has to own all of the hardware in the system to make this strategy function. If that is correct, it would mean the applications are limited to dedicated machines, such as subsystems within a larger system configuration. (This was a popular strategy for enterprise database appliances many years ago.)

There is significant innovative thinking in this paper, but my impression is that SHMT won't be a high-impact technology for many years. SYCL, a simpler approach I think, has been under development for about 10 years, has had significant support from Intel, but hasn't really changed the world much yet.
 
Last edited:
It's understandable to feel frustrated with dense research papers like this one, especially when they're riddled with grammatical errors and lack readability. While heterogeneous computing, as explored in the paper, holds promise, it's essential to consider practical limitations and implementation challenges. Interestingly, the SHMT approach introduces an innovative instruction processing strategy reminiscent of modern CPU internals. However, its reliance on owning all hardware in the system may limit its application to dedicated machines. Despite its potential, SHMT may not have a significant impact on technology for several years. In the realm of casino SEO, practicality and effectiveness are paramount for immediate results and success.
 
The problem of non-conventional approaches are 'software'. If we have super great compiler and runtime which are capable of break large operations into smaller OPs which perfectly fits to each accelerator(whatever accelerators are), removing all possible bubbles then it'll work. But can we? This is hardest part of the paper. Massive cache-memory subsystems and queueing in each devices(CPU, GPU, acc...) makes prediction extremely difficult. Even example in the paper (4 x GEMMs), no one really knows if when each commands will complete. And if error state is included then it's more difficult(what if certain accelerator fails or time-outs when operations are separated into smaller ones...etc).

This is what Intel was trying in OneAPI, but these days people simply decided to use tons of GPUs or TPUs(Google).
 
The problem of non-conventional approaches are 'software'. If we have super great compiler and runtime which are capable of break large operations into smaller OPs which perfectly fits to each accelerator(whatever accelerators are), removing all possible bubbles then it'll work. But can we? This is hardest part of the paper. Massive cache-memory subsystems and queueing in each devices(CPU, GPU, acc...) makes prediction extremely difficult. Even example in the paper (4 x GEMMs), no one really knows if when each commands will complete. And if error state is included then it's more difficult(what if certain accelerator fails or time-outs when operations are separated into smaller ones...etc).

This is what Intel was trying in OneAPI, but these days people simply decided to use tons of GPUs or TPUs(Google).
OneAPI's value is that you can develop and test code in data parallel C++ on a PC and execute it on a GPU or an FPGA without source code modifications for the target hardware. Using large populations of GPUs and TPUs is different, and Intel uses PyTorch as an additional layer to enable those applications. What I don't is how well the entire end-to-end software chain works. Intel has a customer testimonials page with 313 articles, but few are for OneAPI. It is difficult to gauge how magical the magic compilers actually are.

 
Back
Top