Instance

Array
(
    [title] => Recent Forum Threads
    [title_url] => 
    [ignore_sticky] => 0
    [exclude_current] => 0
    [limit] => 10
    [sluglist] => ["jobs-dashboard"]
    [rw_opt] => Array
        (
            [widget_select] => 1
            [pageid_281769] => 1
            [pageid_281772] => 1
        )

    [display_widget_mobile] => 
    [rw_opt_exclude] => Array
        (
            [pageid_274493] => 1
            [cpt_podcast] => 1
            [cpta_podcast] => 1
            [category_16613] => 1
            [category_16631] => 1
            [taxonomy_series] => 1
            [pageid_354254] => 1
        )

    [node_id] => Array
        (
            [0] => 2
        )

)

Threads

Recent Article Comments

Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
You’re welcome — and thanks for the kind words. Glad you enjoyed the article.

— Jonah McLeod on April 4, 2026
Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
Thanks for writing.. .enjoyed your article

— Rahul Razdan on April 4, 2026
Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
@kingmouf - Functionally, the fabrics are very similar (6-input LUTS, DSPs, BRAM, interconnect, etc.). DSPs are slightly different and both…

— ajaros925 on March 31, 2026
Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
How does the eFPGA fabric mentioned here compares to AMD(Xilinx)/Altera fabrics? How do you address potential security issues?

— kingmouf on March 31, 2026
Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
Interesting article. eFPGA is clearly valuable as silicon insurance, but it still buys that flexibility with meaningful area, power, and…

— TomJackson on March 30, 2026
Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
Your point that radiation accelerates device aging is a real constraint. But it’s also a predictable one. Space hardware is…

— Jonah McLeod on March 29, 2026
Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
He's fixated on the heating thing because it's the only theoretically viable aspect of his new scam. After considering what…

— coldsolder215 on March 29, 2026
Chemical Origins of Environmental Modifications to MOR Lithographic Chemistry
This is an important finding for understanding how MORs work, but it clearly puts oxygen in the role that acids…

— Fred Chen on March 26, 2026
Captain America: Can Elon Musk Save America’s Chip Manufacturing Industry?
No, Elon won’t turn into LBT but he doesn’t need to. All he needs is to create an culture where…

— Jonah McLeod on March 25, 2026
Captain America: Can Elon Musk Save America’s Chip Manufacturing Industry?
That is the first time I hear "egos in check" and "Elon" in the same sentence. Not going to happen,…

— jmlobert on March 25, 2026

RVN! 26 Banner revised (800 x 100 px) (600 x 100 px)

WP_Term Object
(
    [term_id] => 6435
    [name] => AI
    [slug] => artificial-intelligence
    [term_group] => 0
    [term_taxonomy_id] => 6435
    [taxonomy] => category
    [description] => Artificial Intelligence
    [parent] => 0
    [count] => 806
    [filter] => raw
    [cat_ID] => 6435
    [category_count] => 806
    [category_description] => Artificial Intelligence
    [cat_name] => AI
    [category_nicename] => artificial-intelligence
    [category_parent] => 0
)

May 23, 2023June 16, 2023 by Bernard Murphy

A Negative Problem for Large Language Models

A Negative Problem for Large Language Models
by Bernard Murphy on 05-23-2023 at 6:00 am
Categories: AI
2 Comments

I recently read a thought-provoking article in Quanta titled Chatbots Don’t Know What Stuff Isn’t. The point of the article is that while large language models (LLMs) such as GPT, Bard and their brethren are impressively capable, they stumble on negation. An example offered in the article suggests that while a prompt, “Is it true that a bird can fly?”, would be answered positively with prolific examples, the inverse, “Is it true that a bird cannot fly?”, will likely also produce a positive answer supported by the same examples. The word “not” is effectively invisible to LLMs, at least today.

The Quanta article is well worth reading, as are most Quanta articles. What is especially interesting is that fixing LLMs to manage negatives reliably is proving to be more challenging than at first thought. I see two interesting ways to frame the problem, first a computer science analysis, second in asking what we mean by “not”.

Why do LLMs struggle with negation?

These models learn, from spectacularly large amounts of data, to generate a model of reality. An LLM builds a model of likelihoods of sequences of words associated with corresponding topics. There is no place in such a model to handle negation of a word. How would it be possible for inference to map “not X” as a term when the deep learning model is built on training data in which terms are necessarily positive (“X” rather than “not X”)?

SQL selections routinely handle negative terms – “select all clients who are not in the US” (I’m being casual with syntax). Why couldn’t LLMs do the same thing? They could in training use a similar selection mechanism to pre-determine what data should be used for training. But then the model would be trained explicitly to handle prompts with that specific negation, blocking hope of answering prompts about clients who are in the US. What we really want is a trained model which can answer prompts for both “in the US” and “not in the US”, which seems to require two models. That’s just to cover one negation possibility. As the number of terms which might be negated increases, the number of models (and time to train and infer) grows exponentially.

Research suggests that ChatGPT has improved a little in handling negatives and antonyms through human-in-the-loop training. However, experts claim developers are chipping away at the problem rather than finding major breakthroughs. When you consider the significant range of possibilities in expressing a negative (explicit negation or use of an antonym, both allowing for many ways of re-phrasing), this perhaps should not be too surprising.

What do we mean by “not”?

“Not” in natural language carries a wealth of meaning which is not immediately apparent from a CS viewpoint. We want “not” to imply a simple inverse but consider the earlier example “Is it true that a bird cannot fly?”. Many birds can (robins, ducks, eagles), some cannot (penguins, some species of steamer duck, ostriches), and some can manage a little but not sustained flight (chickens). Some mammals can glide (flying squirrels); are they birds? The question doesn’t admit a simple yes/no answer. An LLM would likely present these options, ignoring the “not” but not really answering the question in a way that would demonstrate understanding. That is good enough for a search but is hardly a foundation for putting us all out of work.

“Not” provides a simple demonstration that meaning cannot be extracted from text by statistical analysis alone, no matter how large the training dataset. At some point meaning must tap into “commonsense”, all the implicit understanding we have in using language. “Not” highlights this dependency because “not X” implies absolutely everything – not including X – is possible. We deal with this crazy option in real life through commonsense, eliminating all except reasonable options. An LLM can’t do that because (as far as I know) there is no corpus for commonsense. LLMs can be patched through human guidance to do better on specific cases, but I am skeptical that patching can generalize.

LLM has demonstrated amazing capabilities, but like any technology we build it has limits which are becoming clearer, thanks in part to one seemingly inoffensive word.

Share this post via:

Comments

2 Replies to “A Negative Problem for Large Language Models”

You must register or log in to view/post comments.

Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
You’re welcome — and thanks for the kind words. Glad you enjoyed the article.

— Jonah McLeod on April 4, 2026
Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
Thanks for writing.. .enjoyed your article

— Rahul Razdan on April 4, 2026
Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
@kingmouf - Functionally, the fabrics are very similar (6-input LUTS, DSPs, BRAM, interconnect, etc.). DSPs are slightly different and both…

— ajaros925 on March 31, 2026
Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
How does the eFPGA fabric mentioned here compares to AMD(Xilinx)/Altera fabrics? How do you address potential security issues?

— kingmouf on March 31, 2026
Silicon Insurance: Why eFPGA is Cheaper Than a Respin — and Why It Matters in the Intel 18A Era
Interesting article. eFPGA is clearly valuable as silicon insurance, but it still buys that flexibility with meaningful area, power, and…

— TomJackson on March 30, 2026
Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
Your point that radiation accelerates device aging is a real constraint. But it’s also a predictable one. Space hardware is…

— Jonah McLeod on March 29, 2026
Musk’s Orbital Compute Vision: TERAFAB and the End of the Terrestrial Data Center
He's fixated on the heating thing because it's the only theoretically viable aspect of his new scam. After considering what…

— coldsolder215 on March 29, 2026
Chemical Origins of Environmental Modifications to MOR Lithographic Chemistry
This is an important finding for understanding how MORs work, but it clearly puts oxygen in the role that acids…

— Fred Chen on March 26, 2026

Search Semiwiki

Recent Forum Threads

Recent Article Comments

Recent Podcast Episodes

Why do LLMs struggle with negation?

What do we mean by “not”?

Comments

2 Replies to “A Negative Problem for Large Language Models”

Recent Forum Threads

Recent Article Comments