Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

CEO Interview with Jerome Paye of TAU Systems
Seems iike Tau is competing with Substrate and xLight? A single source component is not enough to drive the system…

— Fred Chen on March 8, 2026
Things From Intel 10K That Make You Go …. Hmmmm
Here are few things you've missed: -The "magical customer" is no longer a fantasy, Apple has already signed an NDA…

— thepowerg on March 8, 2026
An AI-Native Architecture That Eliminates GPU Inefficiencies
Hi Todd, I appreciate your feedback regarding my description of GPGPU branch behavior. It’s a nuanced topic that deserves a…

— Lauro Rizzatti on March 8, 2026
An AI-Native Architecture That Eliminates GPU Inefficiencies
The statement that the branch behavior of GPGPU applications is incorrect. GPGPUs do not allow each thread to follow a…

— Todd B. on March 5, 2026
A Detailed History of Samsung Semiconductor
Awesome! Very well researched and explained. I learned a lot!

— runawaymo on March 4, 2026
Perforce and Siemens Collaborate on 3DIC Design at the Chiplet Summit
An example of PLM2PLM Interation liking Hub-The-Spoke

— yanfeng on March 4, 2026
Memory Matters: Signals from the 2025 NVM Survey
What about putting memory on a separate chiplet? Is that tied to increasing bit capacity?

— Fred Chen on March 3, 2026
Advancing Automotive Memory: Development of an 8nm 128Mb Embedded STT-MRAM with Sub-ppm Reliability
Gotta note the fine print: read failure is ppm-level, not sub-ppm. This is also pointed out in the paper and…

— Fred Chen on March 2, 2026
Advancing Automotive Memory: Development of an 8nm 128Mb Embedded STT-MRAM with Sub-ppm Reliability
The key question is whether changing your SoC or MCU design to 8nm to include 128 Mb (or larger) eMRAM…

— Fred Chen on March 1, 2026
The Name Changes but the Vision Remains the Same – ESD Alliance Through the Years
Thanks for the additional history Dave, much appreciated. You are correct about the broad charter of EDAC. I remember participating…

— Mike Gianfagna on February 28, 2026

WP_Term Object
(
    [term_id] => 6435
    [name] => AI
    [slug] => artificial-intelligence
    [term_group] => 0
    [term_taxonomy_id] => 6435
    [taxonomy] => category
    [description] => Artificial Intelligence
    [parent] => 0
    [count] => 778
    [filter] => raw
    [cat_ID] => 6435
    [category_count] => 778
    [category_description] => Artificial Intelligence
    [cat_name] => AI
    [category_nicename] => artificial-intelligence
    [category_parent] => 0
)

February 18, 2026March 10, 2026 by Bernard Murphy

Improving Retrieval Accuracy in AI

Improving Retrieval Accuracy in AI
by Bernard Murphy on 02-18-2026 at 6:00 am
Categories: AI

Key takeaways ▼

While there are big ambitions for virtual engineers and other self-guiding agentic applications, today estimates show 83-90% of AI inferences are for internet searches. On a related note, chatbots are now said to account for nearly 60% of internet traffic. Search and support are the biggest market drivers for automation and unquestionably have improved through AI automation. Search gets closer to what you want in one pass. Chatbots also depend on retrieving domain-specific information following a question. In either case, RAG – retrieval augmented generation – plays an important role in finding the most relevant sources for a search or chat response.

Or so you hope. My experience is that the RAG results I get in a basic search/question are most useful for simple (one-part) questions in areas where I have no expertise. The more expertise I have or the more complex my question, the less useful I find the response. I can do somewhat better by adding context (You are an expert in … My question is …). Asking for citations also helps. But even these tricks don’t always work. The problem is that RAG as originally conceived (2020) has limitations. I thought it would be interesting to look at advances in this field, for variety looking from a business perspective and a healthcare perspective. In AI, it seems that our needs and priorities are not so different.

A business perspective from Elastic and Cohere

Target applications here cover a wide range in business: finance, public sector, energy, media, etc. I found this webinar which presents a combination of these two technologies with particular emphasis on RAG, the basics, challenges, and advances.

First, a quick note on RAG. LLMs are trained on publicly accessible corpora. RAG training derives information from separate and typically internal proprietary sources: PDFs, spreadsheets, images, etc. This information is chunked in some manner (eg. paragraphs in PDF text) and encoded as vectors based on similarity (scalar products of vectors, so related objects are close and unrelated objects are not). Chunks in training data must be expert (human) labeled.

Retrieval then uses a mix of keyword matching and similarity-based search to develop a top-ranked set of responses to your question. RAG is more accurate in retrieval than a general-purpose LLM because it can exploit semantic understanding based on similarity matching between a query and labeled training data.

So far this is naive RAG, with known limitations. These include struggles where the answer needed may require wider understanding of a source, or multiple sources, or the question asked has multiple clauses and requires sequential reasoning.

You know what’s coming next: agentic RAG, also called advanced-RAG. To address these limitations a system must develop a plan of attack, do multiple hops of reasoning, and self-reflect/verify after each step, potentially triggering rework. This is what agentic does. As soon as a question/request becomes even moderately complex, resolution must turn agentic, even in RAG. Tools used to support such agentic flows in business applications might be Microsoft Office, CRM, or SQL databases.

For completeness, a further advance you may find is modular RAG. These systems allow for more building block approaches to blend retrieval and refinement in structuring pipelines.

A healthcare perspective from Kent State and Rutgers

Here I draw on a long but very interesting paper. The authors suggest the following as key applications in healthcare: diagnostic assistance by retrieving information on similar cases; summarizing health records and discharge notes; answering complex medical questions; educating patients and tailoring responses to user profiles; matching candidates to clinical trials; and retrieving and summarizing biomedical literature, especially recent literature, in response to a clinical or research query.

The authors note a range of challenges in retrieving information. Obviously such a system must handle a wide range of data types (modalities), from doctor notes to X-rays, EKG traces, lab results, etc. They must also contend with a wide range of potentially incompatible health record sources, some with technically precise notes (myocardial infarction), some less precise (heart attack). Users face challenges in understanding the credibility of sources (media health articles, versus Reddit, versus respected journals in a field) and how these contribute to ranking conclusions. Familiar challenges even in our field.

There is a longer list from which I’ll call out one widely relevant item: the need to continuously update as new research, drugs and treatments emerge, also need to deprecate outdated sources. In a medical context the authors suggest that manual updates would be too slow and error-prone and that any useful RAG system for their purposes must build continuous update into the system.

They look at tradeoffs between the three RAG architectures mentioned earlier (naive, advanced, and modular). They find naive RAG easy to setup and use, though for their purposes too noisy and risky for high-stakes scenarios. Advanced RAG is more promising in diagnostic support and EHR summarization, striking a balance between factual grounding and speed, but requires significant compute resource (presumably an on-prem datacenter). This method looks most ready today for clinical use, at least in hospitals and large clinics. They see modular RAG as interesting for ongoing research, though training and resource costs make it impractical to consider for near-term deployment.

Relevance to design automation

Accuracy is critical for technical support in our domain, whether internal or external. Our users are very knowledgeable and intolerant of beginner-level suggestions. Experiences above suggest that advanced/agentic RAG may be the most appropriate method to deploy support here.

That guidance should aim to avoid mistakes made in some ambitious all-AI rollouts (Klarna customer support for example). These certainly should include emphasis on “don’t know” for suggestions with low support, explainability for top candidate response offered, and methods to escalate to a human expert when the bot is uncertain. I am starting to see some of this in general customer support.

Meantime, agentic RAG can make a big difference in productivity and user satisfaction for in-house and external users. Most of us would prefer to explore on our own supported by effective agentic RAG, only turning to a human expert when we’re not making progress. That’s technology worth supporting.

Also Read:

Bronco Debug Stress Tested Measures Up

TSMC and Cadence Strengthen Partnership to Enable Next-Generation AI and HPC Silicon

How Memory Technology Is Powering the Next Era of Compute

Share this post via:

Comments

There are no comments yet.

You must register or log in to view/post comments.

CEO Interview with Jerome Paye of TAU Systems
Seems iike Tau is competing with Substrate and xLight? A single source component is not enough to drive the system…

— Fred Chen on March 8, 2026
Things From Intel 10K That Make You Go …. Hmmmm
Here are few things you've missed: -The "magical customer" is no longer a fantasy, Apple has already signed an NDA…

— thepowerg on March 8, 2026
An AI-Native Architecture That Eliminates GPU Inefficiencies
Hi Todd, I appreciate your feedback regarding my description of GPGPU branch behavior. It’s a nuanced topic that deserves a…

— Lauro Rizzatti on March 8, 2026
An AI-Native Architecture That Eliminates GPU Inefficiencies
The statement that the branch behavior of GPGPU applications is incorrect. GPGPUs do not allow each thread to follow a…

— Todd B. on March 5, 2026
A Detailed History of Samsung Semiconductor
Awesome! Very well researched and explained. I learned a lot!

— runawaymo on March 4, 2026
Perforce and Siemens Collaborate on 3DIC Design at the Chiplet Summit
An example of PLM2PLM Interation liking Hub-The-Spoke

— yanfeng on March 4, 2026
Memory Matters: Signals from the 2025 NVM Survey
What about putting memory on a separate chiplet? Is that tied to increasing bit capacity?

— Fred Chen on March 3, 2026
Advancing Automotive Memory: Development of an 8nm 128Mb Embedded STT-MRAM with Sub-ppm Reliability
Gotta note the fine print: read failure is ppm-level, not sub-ppm. This is also pointed out in the paper and…

— Fred Chen on March 2, 2026

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

A business perspective from Elastic and Cohere

Relevance to design automation

Also Read:

Comments

Recent Forum Threads

Recent Article Comments