How To Solve LLM Hallucinations

Daniel Nenni · Jun 13, 2024

The Necessary Step to AI Revenue
At this point, I hope most of my audience have had experience with the publicly available large language models - either running the software yourself, investing in a subscription to one of the many online services, or any of the free and beta solutions currently out there. For the most part, these large language models are by definition large - billions of parameters, often trained on lots of unstructured language data. For most of the industry, the number of the parameters is often analogous to the accuracy of these models - the more data you train with, and the more parameters in the design, the wider the scope of information these general models can hold and recall or generate. However that's not always the case, and there's one big problem with this market right now: hallucinations.

This week, startup Lamini has published a paper showcasing new fundamental methodology to decrease hallucinations in LLMs by a conservative 95%. Lamini is headed up by co-founders CEO Sharon Zhou (PhD and Faculty in Gen AI from Andrew Ng's group, MIT award winning Gen AI research, largest Gen AI Coursera courses) and CTO Greg Diamos (NVIDIA/CUDA architect, 14000+ citations, AI scaling laws, ML Perf co-founder), and broke into the mold by being one of the first companies to offer fine-tuning as a service for LLMs. What made them different was the preference for AMD Instinct MI200/MI300 GPUs, even with one of the NVIDIA Tensor Core architects as a co-founder. The company completed Series A in early 2024, with $25m in funding, having lead investors such as Amplify Partners and First Round Capital. Other investors include Lip-Bu Tan, Andrej Karpathy, and Andrew Ng. Lamini already holds Fortune 500 companies as customers, and offers per-GPU licensed based middle-layer software as well as cloud inference services.

I visited the Lamini offices yesterday. Here’s me with CTO Greg Diamos

The Problem of Hallucinations
Large language models right now fit into the category of 'generative AI' - you feed it a prompt of tokens/words, and you get some tokens/words back. But what you get back is generated based on the input, and due to the probabilistic functions in the design, the output is 'generated' and can appear to give you detail on topics that were initially part of the dataset, but abstracted away into an embedding space inside the model. For example, the concept of 'parent' could be embedded as a vector between son and father, and a similar vector could also be used to describe a country that has changed its name.

However, models hallucinate. It's not necessarily confined to large models, but generative AI has natively been built with hallucinations in mind. This is ultimately where it gives the wrong information, or creates a relationship in that embedding space that shouldn't exist, resulting in erroneous output.

The problem of hallucinations is derived from a number of areas, but I'll pick two here. First is simply facts - large general models are poor about holding facts. They're good at concepts and explaining concepts, but asking a general model about a person's birthday is often a no-go area. The reason is because in the dataset, even if the right answer is the most likely, there will be lots of similar pieces of information which could be chosen as part of the response from the model. A good example here is when I asked a general Llama2-7B model for AMD CEO Lisa Su's birthday - it got the year correct, but the date was actually the date attributed to the discovery of the transistor. Lisa Su is closely linked with chips and transistors, and so in the embedding space it was chosen as a likely candidate to fit the answer. The model hallucinated.

Second comes from how these general models are trained. The dataset may be public information, correct or incorrect (cough, reddit, Wikipedia), or even contradictory information, but these models are designed to give you an answer, right or wrong. Unless the question is caught in the guard rails of 'don't answer questions about this topic', almost all language models are predisposed to give answers, regardless of if they're actually correct. This not only applies to facts, but concepts that weren't directly in the dataset but may be derived from the dataset. With a specific model, LiDAR and RADAR might be similar, or the number 10 million might be the same weight as 3 million - makes a lot of difference if you're using a model for employment contracts.

Part of the issue with the models is that general training data is just that - general. A well formed dataset (which most aren't) will provide output at a similar level across many topics. The loss function (a level of accuracy, where lower numbers are better) across a wide array of tests will typically come out similar regardless of the topic in the test. So hallucinations can occur across many different concepts in the model, regardless of the parameter size of the model. Typically training a large model with a dataset from scratch is a one shot event, simply because the dataset is massive, and the cost to train on that data is immense - we're fast approaching billions of dollars for the largest models today already, and that's not the cost of the GPUs.....

Continued...

Search

How To Solve LLM Hallucinations

Daniel Nenni

Admin