Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
It would have indeed been good for Intel, but perhaps this is a blessing, after all. I would not want…

— jmlobert on August 5, 2025
Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
Tesla has two sets of silicon that are well below 28nm. The first is the Ryzen infotainment system -- that's…

— Xebec on August 5, 2025
Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
The Samsung deal is for TSLA's AI6 chip, which is a couple of years away (AI5 isn't out yet). In…

— rgrindley on August 4, 2025
Why I Think Intel 3.0 Will Succeed
that was totally China's fault

— siliconbruh999 on August 4, 2025
Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
14nm shipping since 2019? https://www.autopilotreview.com/tesla-custom-ai-chips-hardware-3/ . Current shipping FSD 4 silicon is 7nm or 4nm .

— pvaris on August 4, 2025
Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
INTC would of been the fit, but assuming they talked, Lip probably didn't agree to any of the nonsense.

— Rob McCance on August 3, 2025
Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
Oh goody, Musk in the fab. Wait until the fab manager and engineers get a taste of Musk’s demands for…

— icartist on August 2, 2025
Why I Think Intel 3.0 Will Succeed
Don’t forget about the failed Tower Semi purchase. That would have helped a lot, imo.

— NEO on August 2, 2025
cHBM for AI: Capabilities, Challenges, and Opportunities
Computational HBM sounds a bit like Computing-in-Memory?

— Fred Chen on July 31, 2025
Intel has a new Billionaire CEO!
The only question is why he remains with Walden even when he has taken probably the most challenging and reputable…

— Hart XU on July 30, 2025

WP_Term Object
(
    [term_id] => 6435
    [name] => AI
    [slug] => artificial-intelligence
    [term_group] => 0
    [term_taxonomy_id] => 6435
    [taxonomy] => category
    [description] => Artificial Intelligence
    [parent] => 0
    [count] => 659
    [filter] => raw
    [cat_ID] => 6435
    [category_count] => 659
    [category_description] => Artificial Intelligence
    [cat_name] => AI
    [category_nicename] => artificial-intelligence
    [category_parent] => 0
)

October 2, 2024October 14, 2024 by Bernard Murphy

Is AI-Based RTL Generation Ready for Prime Time?

Is AI-Based RTL Generation Ready for Prime Time?
by Bernard Murphy on 10-02-2024 at 6:00 am
Categories: AI, EDA

In semiconductor design there has been much fascination around the idea of using large language models (LLMs) for RTL generation; CoPilot provides one example. Based on a Google Scholar scan, a little over 100 papers were published in 2023, jumping to 310 papers in 2024. This is not surprising. If it works, automating design creation could be a powerful advantage to help designers become more productive (not to replace them as some would claim). But we know that AI claims have a tendency to run ahead of reality in some areas. Where does RTL generation sit on this spectrum?

Benchmarking

The field has moved beyond the early enthusiasm of existence proofs (“look at the RTL my generator built”) to somewhat more robust analysis. A good example is a paper published very recently in arXiv: Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks, with a majority of authors from Nvidia and one author from Cornell. A pretty authoritative source.

The authors have extended a benchmark (VerilogEval) they built in 2023 to evaluate LLM-based Verilog generators. The original work studied code completion tasks; in this paper they go further to include generating block RTL from natural language specifications. They also describe a mechanism for prompt tuning through in-context learning (additional guidance in the prompt). Importantly for both completion and spec to RTL they provide a method to classify failures by type, which I think could be helpful to guide prompt tuning.

Although there is no mention of simulation testbenches, the authors clearly used a simulator (Icarus Verilog) and talk about Verilog compile-time and run-time errors, so I assume the benchmark suite contains human-developed testbenches for each test.

Analysis

The authors compare performance across a wide range of LLMs, from GPT-4 models to Mistral, Llama, CodeGemma, DeepSeek Coder and RTLCoder DeepSeek. Small point of initial confusion for this engineer/physicist: they talk about temperature settings in a few places. This is a randomization factor for LLMs, nothing to do with physical temperature.

First, a little background on scoring generated code. The usual method to measure machine generated text is a score called BLEU (Bilingual evaluation understudy), intended to correlate with human-judged measures of quality/similarity. While appropriate for natural language translations, BLEU is not ideal for measuring code generation. Functional correctness is a better starting point, as measured in simulation.

The graphs/tables in the paper measure pass rate against a benchmark suite of tests, allowing one RTL generation attempt per test (pass@1), so no allowance for iterated improvement except in 1-shot refinement over 0-shot. 0-shot measures generation from an initial prompt and 1-shot measures generation from the initial prompt augmented with further guidance. The parameter ‘n’ in the tables is a wrinkle to manage variance in this estimate – higher n, lower variance.

Quality, measured through test pass rates within the benchmark suite, ranges from below 10% to as high as 60% in some cases. Unsurprisingly smaller (LLM) models don’t do as well as bigger models. Best rates are for GPT-4 Turbo with ~1T parameters and Llama 3.1 with 405B parameters. Within any given model, success rates for code completion and spec to RTL tests are roughly comparable. In many cases in-context learning/refined prompts improve quality, though for GPT-4 Turbo spec-to-RTL and Llama3 70B prompt engineering actually degrades quality.

Takeaways

Whether for code completion or spec to RTL, these accuracy rates suggest that RTL code generation is still a work in process. I would be curious to know how an entry-level RTL designer would perform against these standards.

Also in this paper I see no mention of tests for synthesizability or PPA. (A different though smaller benchmark, RTLLM, also looks at these factors, where PPA is determined in physical synthesis I think – again short on details.)

More generally I also wonder about readability and debuggability. Perhaps here some modified version of the BLEU metric versus expert-generated code might be useful as a supplement to these scores.

Nevertheless, interesting to see how this area is progressing.

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.

Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
It would have indeed been good for Intel, but perhaps this is a blessing, after all. I would not want…

— jmlobert on August 5, 2025
Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
Tesla has two sets of silicon that are well below 28nm. The first is the Ryzen infotainment system -- that's…

— Xebec on August 5, 2025
Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
The Samsung deal is for TSLA's AI6 chip, which is a couple of years away (AI5 isn't out yet). In…

— rgrindley on August 4, 2025
Why I Think Intel 3.0 Will Succeed
that was totally China's fault

— siliconbruh999 on August 4, 2025
Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
14nm shipping since 2019? https://www.autopilotreview.com/tesla-custom-ai-chips-hardware-3/ . Current shipping FSD 4 silicon is 7nm or 4nm .

— pvaris on August 4, 2025
Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
INTC would of been the fit, but assuming they talked, Lip probably didn't agree to any of the nonsense.

— Rob McCance on August 3, 2025
Musk’s new job as Samsung Fab Manager – Can he disrupt chip making? Intel outside
Oh goody, Musk in the fab. Wait until the fab manager and engineers get a taste of Musk’s demands for…

— icartist on August 2, 2025
Why I Think Intel 3.0 Will Succeed
Don’t forget about the failed Tower Semi purchase. That would have helped a lot, imo.

— NEO on August 2, 2025

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Benchmarking

Analysis

Takeaways

Comments

Recent Forum Threads

Recent Article Comments