WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 621
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 621
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)
            
14173 SemiWiki Banner 800x1001
WP_Term Object
(
    [term_id] => 15
    [name] => Cadence
    [slug] => cadence
    [term_group] => 0
    [term_taxonomy_id] => 15
    [taxonomy] => category
    [description] => 
    [parent] => 157
    [count] => 621
    [filter] => raw
    [cat_ID] => 15
    [category_count] => 621
    [category_description] => 
    [cat_name] => Cadence
    [category_nicename] => cadence
    [category_parent] => 157
)

Prompt Engineering for Security. Innovation in Verification

Prompt Engineering for Security. Innovation in Verification
by Bernard Murphy on 07-30-2025 at 6:00 am

We have a shortage of reference designs to test detection of security vulnerabilities. An LLM-based method demonstrates how to fix that problem with structured prompt engineering. Paul Cunningham (GM, Verification at Cadence), Raúl Camposano (Silicon Catalyst, entrepreneur, former Synopsys CTO and lecturer at Stanford, EE292A) and I continue our series on research ideas. As always, feedback welcome.

Innovation New

The Innovation

This month’s pick, Empowering Hardware Security with LLM: The Development of a Vulnerable Hardware Database was published in the 2024 IEEE Hardware-Oriented Security and Trust and has 12 citations. The authors are from the University of Florida Gainesville.

The authors use LLMs to create a large database (Vul-FSM) of FSM designs vulnerable to a set of 16 weaknesses, documented either in the CWE-MITRE database or in separate guidelines, by inserting these weaknesses into base designs. The intent is to use this dataset as a reference for security analysis tools or security mitigations; the dataset is available on GitHub. They also provide an LLM-based mechanism to detect such vulnerabilities.

The core of the method revolves around a structured approach to prompt engineering to generate (they claim) high integrity test cases and methods for detection. Their prompt engineering methods, such as in-context learning, appear relevant to a broader set of verification problems.

Paul’s view

Hardware security verification is still a somewhat niche market today, but it is clearly on the rise. Open databases to check for known vulnerabilities are making good progress – for example, CWE (cwe.mitre.org) is often used by our customers. However, availability of good benchmark suites of labeled testcases with known vulnerabilities is limited, which in turn limits our ability to develop good EDA tools to check for them.

This month’s paper uses LLM prompt engineering with GPT 3.5 using OpenAI’s APIs to create a labeled benchmark suite of 10k Verilog designs for simple control circuit state machines with 3 to 10 states. Each of these designs contains at least one of 16 different known vulnerabilities, and has been created from a base set of 400 control circuits that do not contain any vulnerabilities. The paper also describes a LLM-based vulnerability detection system for these same 16 vulnerabilities using prompt engineering which is surprisingly effective – 80% likely on average to detect the vulnerability.

One of the best parts of the paper is Figure 6 which shows an example of an actual complete LLM prompt clearly divided into sections showing chain-of-thought (giving LLM step-by-step instructions on how to solve the problem), reflexive verification (giving LLM instructions on how to check that it’s response is correct), and exemplary demonstration (giving the LLM an example of a solution to the problem for another circuit). There are some decent charts elsewhere in the paper that show how much these prompt engineering techniques improve the quality of response from the LLM – about 10-20% depending on the vulnerability.

I’m grateful to the authors for their contribution to the security verification community here!

Raúl’s view

This paper introduces SecRT-LLM, a novel framework for generating and detecting security vulnerabilities in hardware designs, specifically finite state machines (FSMs), that leverages large language models (LLMs). SecRT-LLM uses vulnerability insertion to create a benchmark of 10,000 small RTL FSM designs with 16 types of embedded vulnerabilities (Table II), many based on CWE (Common Weakness Enumeration) classes. It also does vulnerability detection, identifying security issues in RTL on this benchmark.

One of the key contributions is the integration of prompt engineering, LLM inference, and fidelity checking. Prompting strategies in particular are quite elaborate aimed at guiding the LLM to perform the target task. Six tailored prompt strategies greatly improve LLM performance:

  • Reflexive Verification Prompting (self-scrutiny, e.g., indicate where and how have the instructions in the prompt been followed)
  • Sequential Integration Prompting (chain-of-thought, dividing a task into sub-tasks)
  • Exemplary Demonstration Prompting (example designs)
  • Contextual Security Prompting (inserting and identifying security vulnerabilities and weaknesses)
  • Focused Assessment Prompting (emphasize detailed examination of a specific design element such as a deadlock)
  • Structured Data Prompting (systematic arrangement of extensive data for example as a table).

A prompt example is given in Fig. 6.

Experimental Validation shows high accuracy in both insertion (~82% pass@1 and ~97% pass@5) and detection (~80% pass@1 and ~99% pass@5) of vulnerabilities. Automating this process drastically reduces time and cost compared to manual efforts.

The paper applies AI capabilities to hardware security needs. Two major contributions are generating a benchmark of FSMs with embedded vulnerabilities, which serves as a resource for training and evaluating vulnerability detection tools and using prompt engineering for security-centric tasks to guide LLMs. Most commercial tools today focus on verification, threat modeling, and formal methods—but do not yet deeply leverage LLMs for RTL vulnerability tasks. Research such as SecRT-LLM addresses this gap and may influence future commercialization of AI in this field.

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.