WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 597
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 597
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)
            
14173 SemiWiki Banner 800x1001
WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 597
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 597
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)

Compiler Tuning for Simulator Speedup. Innovation in Verification

Compiler Tuning for Simulator Speedup. Innovation in Verification
by Bernard Murphy on 11-27-2024 at 6:00 am

Key Takeaways

  • The paper discusses the use of Bayesian optimization methods for compiler autotuning to improve performance in logic simulation.
  • The research shows a significant 20% average speed-up over the GCC -O3 flag by employing a modified Bayesian algorithm and evaluating multiple C program benchmarks.
  • The findings propose that these optimization techniques can enhance various aspects of hardware design, particularly simulation and potentially synthesis processes.

Modern simulators map logic designs into software to compile for native execution on target hardware. Can this compile step be further optimized? Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and now Silvaco CTO) and I continue our series on research ideas. As always, feedback welcome.

Compiler Tuning for Simulator Speedup. Innovation in Verification

The Innovation

This month’s pick is Efficient Compiler Autotuning via Bayesian Optimization. This was published in in the 2021 IEEE/ACM International Conference on Software Engineering and has 22 citations. The authors are from Tianjin University, China and Newcastle University, Australia.

Compiled code simulation is the standard for meeting performance needs in software-based simulation, so should benefit from advances in compiler technology from the software world. GCC and LLVM compilers already support many optimization options. For ease of use, best case sequences of options are offered as -O1/-O2/-O3 flags to improve application runtime, determined by averaging over large codebases and workloads. An obvious question is whether a different sequence delivering even better performance might be possible for a specific application.

This is an active area of research in software engineering, looking not only at which compiler options to select (eg function inlining) but also in what order these options should appear in the sequence, since options are not necessarily independent (A before B might deliver different performance versus B before A).

Paul’s view

Using machine learning to pick tool switches in Place & Route to improve PPA is one of the best examples of commercially deployed AI in the EDA industry today. A similar concept can be applied to picking compiler switches in logic simulation to try and improve performance. Here, there are clear similarities to picking C/C++ compiler switches, known as “compiler autotuning” in academic circles.

In this month’s paper, the authors use a modified Bayesian algorithm to try and beat -O3 in GCC. They use a benchmark suite of 20 small ~1k line C programs (matrix math operations, image processing, file compression, hashing) and consider about 70 different low level GCC switches. The key innovation in the paper is to use tree-based neural networks as the Bayesian predictor rather than a Gaussian process, also during training to quickly narrow down 8 “important” switches and heavily explore permutations of these 8 switches.

Overall, their method is able to achieve an average 20% speed-up over -O3. Compared to other state-of-the-art methods this 20% speed-up is achieved with about 2.5x less training compute. Unfortunately, all their results are using a very old version of GCC from 12 years ago, which the authors acknowledge at the end of their paper, along with a comment that they did try using a more recent version of GCC and were able to achieve only a 5% speed-up over -O3. Still, a nice paper, and I do think the general area of compiler autotuning can be applied to improve logic simulation performance.

Raúl’s view

Our 2024 penultimate paper for the year addresses setting optimization flags in compilers to achieve the fastest code execution (presumably other objective functions like code size or energy expended during computation could have been used). In the compilers studied, GCC and LLVM using 71 and 64 optimization flags respectively. The optimization spaces are vast at 271 and 264. Previous approaches use random iterative optimization, genetic algorithms, Irace (tuning of parameters by finding the most appropriate settings given a set of instances of an optimization problem, “learning”). Their system is called BOCA.

This paper uses Bayesian optimization, an iterative method to optimize an objective function using the accumulated knowledge in the known area of the search space to guide samplings in the remaining area in order to find the optimal sample. It builds a surrogate model that can be evaluated quickly, typically using Gaussian Process (GP, you can look it up here, it is not explained in the paper) which doesn’t scale to high dimensionality (number of flags). BOCA uses a Random Forest instead (RF, also not explained in the paper). To further improve the search, optimizations are ranked into “impactful” and “less impactful” using Gini importance to measure the impact of optimizations (look it up here for more detail). Less impactful optimizations are considered only in a limited number of iterations, i.e., they “decay”.

The authors benchmark the 2 compilers on 20 benchmarks against other state of the art approaches, listing the results for 30 to 60 iterations. BOCA achieves given desired speedups in 43%-78% less time. Against the highest optimization setting of the compilers (-o3) BOCA achieves a speedup of 1.25x for GCC and 1.13x for LLVM. Notably, using around 8 impactful optimizations is best, as more can slow BOCA down. The speedup is limited when using more recent GCC versions, 1.04-1.06x.

These techniques yield incremental improvements. They would certainly be significant in HW design where they can be used for simulation and perhaps for setting optimization flags during synthesis and layout, where AI approaches are now being adopted by EDA vendors. Time will tell.

Also Read:

Cadence Paints a Broad Canvas in Automotive

Analog IC Migration using AI

The Next LLM Architecture? Innovation in Verification

Emerging Growth Opportunity for Women in AI

 

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.