When Machines Dream: Hallucination in LLMs

When AI Gets a Little Too Creative (and Confidently Wrong)

·

8 min read

When Machines Dream: Hallucination in LLMs

Preamble

In the marketplace of ideas, bluster can often be mistaken for brilliance, and empty suits can sometimes outsell overflowing ones. This holds for both our daily interactions and in the world of machines. Occasionally, Large Language Models (LLMs) tend to produce responses that sound convincing and authoritative, when in fact the response has no factual basis whatsoever.

Moreover, as we draw parallels between the land of the living and the land of zeros and ones, it's worth noting that just like us humans who experience strange dreams, our mechanical friends can also drift into the realm of hallucinations. And therein lies our problem and the topic of today. These sophisticated AI systems, occasionally weave tales that blend fact and fiction, leaving us to unravel the truth.

Intrigued? Terrified? Perhaps a little of both? Buckle up and join us as we dive into the fascinating world of LLMs, exploring how these digital dreamers sometimes lose their way, why it matters, and what we can do about it.

“I do believe that OpenAI…will solve some of the core platform’s tendency to hallucinate …. but it’s a stochastic model. It’s going to do pattern matching and come up with something, and occasionally it will make up stuff. That’s not our challenge. That’s OpenAI’s challenge: How to reduce its hallucination rate from 20% to 10% to 5% to very little over time.” Peter Relan

Introduction

As artificial intelligence (AI) and large language models (LLMs) like GPT-3 take center stage in Kenya and Africa as a whole, a critical concept emerges: hallucination. This becomes particularly worrisome when we consider that Tidio did a survey and found that 72% of users trust AI to provide factual and reliable information. Also, 75% of those respondents reported AI to mislead at least once.

LLMs, despite their impressive capabilities, can sometimes generate factually incorrect or nonsensical outputs, akin to hallucinations in humans. This is where the models generate information that is not based on real-world data or factual correctness.

This article delves into the world of LLM hallucinations, exploring:

  • What hallucinations are in the context of LLMs

  • Real-world examples with potential African contexts

  • The implications of hallucinations for AI adoption

  • Techniques for detecting and mitigating hallucinations

  • Benchmarks for testing LLM performance

What are LLM Hallucinations?

Imagine asking an LLM for information on famous Kenyan marathoner Eliud Kipchoge's world record time. Instead of the correct answer (2:01:39), the LLM might fabricate a time like "1:55:00." This is an example of an LLM hallucination – a fictional output that lacks grounding in reality but is presented as fact. These hallucinations can range from minor inaccuracies to entirely fabricated information. Despite their sophisticated capabilities, LLMs can produce content that lacks grounding in real-world data, leading to potential misinformation.

Or picture this, you're a Kenyan youth seeking information on the proposed finance bill. You ask your LLM assistant to summarize the key changes in the finance bill for small businesses. A typical hallucination would be a response akin to this "Absolutely! The new bill introduces a fantastic initiative: a 50% tax rebate on all locally sourced materials used by small businesses! This is a game-changer - you'll be saving a fortune!". In actuality, the Kenyan financial bill for 2024 did not include a 50% tax rebate for locally sourced materials. This is a complete fabrication by the LLM.

These examples highlight how LLMs can hallucinate information that sounds plausible but lacks grounding in reality. It is for this reason and more that at SCA_Nairobi we advocate for the awareness of this possibility and the crucial need to double-check information from LLMs with reliable sources.

Forms of Hallucination

Source Conflation

One way these glitches manifest is through a phenomenon called source conflation. Imagine an LLM that's read countless news articles and historical accounts. When prompted, it might weave details from various sources into a seemingly factual narrative, but with factual contradictions. You might ask for a biography of a historical figure say Dedan Kimathi, and get a story that blends details from different periods or even merges accounts of two entirely different people!

Factual Error

Another LLM pitfall is a factual error. Unlike humans who can discern truth from fiction, LLMs can't tell the difference. This can lead to them generating content with no basis in reality. It's like asking your AI assistant for investment advice in the Kenyan tech space, and receiving a strategy based on entirely fabricated market trends!

Internet Misinformation

The culprit behind these factual errors? The very data that trains LLMs. The internet, a treasure trove of information, also contains a significant amount of misinformation. I mean, there is a reason lecturers always insist on proven sources, publications, and peer-reviewed journals and not just blank internet citations. LLMs, trained on this vast dataset, can absorb and regurgitate these inaccuracies, presenting them with the same confidence as facts.

Nonsensical Information

LLMs at their core simply predict the next most probable word in a sentence and yarn the two to make a paragraph. More often than not, the content they generate makes sense. However, they are also prone to outputting grammatically correct text that doesn’t make sense. This leads to the generation of nonsensical information – grammatically correct sentences that convey no real meaning. Think of it as a perfectly written paragraph in a language you don't understand – impressive on the surface, but ultimately meaningless.

Case Studies

Objection! Your Honor

Lawyer Fooled by AI Assistant, Cites Fake Cases in Court

A lawyer attempting to use the AI tool ChatGPT for legal research ended up presenting fake cases to the court. This incident highlights the potential dangers of AI hallucinations, where AI systems generate information that appears real but is entirely fabricated. The lawyer now faces sanctions after the judge discovered the non-existent cases. This case is one of the first instances of AI hallucinations impacting a legal proceeding.

It wasn’t me

AI Chatbot Fabricates Scandal: Professor Falsely Accused

A law professor, Jonathan Turley, received a disturbing email alleging sexual harassment. The accusation originated from an AI chatbot, ChatGPT, which generated a fake news story and named Turley as the perpetrator. This incident highlights the dangers of AI-generated misinformation, especially as these chatbots become more widely used.

That’s a Figment of ChatGPT’s imagination

Inaccurate Summarization of a Court Case

When asked for a summary of a case, ChatGPT responded with factually inaccurate information. It went ahead to completely get wrong who sued who. As if that was not enough, it mentioned that one of the parties defrauded and embezzled funds from a foundation. Consequently, OpenAI was sued with the plaintiff claiming that every detail in the summary was false and was completely fabricated by the AI.

The Perils of Hallucination: Why it Matters

LLM hallucinations can have significant consequences, particularly in Africa, where access to reliable information is crucial. Here's why:

  • Spread of Misinformation: False information generated by LLMs can quickly spread online and through social media, creating confusion and distrust.

  • Erosion of Trust in AI: For a nation like Kenya, a leader in the pursuit of AI advancements, frequent AI missteps can breed distrust, hindering the technology's potential for positive impact. This, in turn, could set progress back significantly, slowing the crucial uptake of AI in vital sectors.

  • Perpetuating Bias: We celebrate our traditions – the vibrant tapestries woven from ancestral wisdom and cultural threads. They define who we are, shaping our identities and sparking joy in shared heritage. But within these cherished traditions, shadows sometimes linger. Unchecked biases, woven into the fabric of the past, can become potent pitfalls in the digital age. Large language models (LLMs) are trained on vast troves of information. But this very data can be a double-edged sword. If the information harbors societal biases, these AI marvels can morph into unwitting villains, perpetuating harmful stereotypes and amplifying existing inequalities. Imagine LLMs that, fueled by biased data, become champions for outdated notions: AI that normalizes teen pregnancy and FGM, champions environmental degradation, or even fuels radical societal division.

Detecting LLM Hallucinations

Fortunately, there are strategies to help us identify these hallucinations. One approach involves fact-checking. Just like with any information source, cross-referencing the LLM's output with reliable sources is crucial.

Contextual consistency is another key factor. Does the LLM's response align with the information or prompt provided? Inconsistencies can be a red flag.

Finally, some techniques analyze the statistical likelihood of an output. If the LLM generates something highly improbable based on the data it was trained on, it might be a hallucination and is therefore flagged.

Building Better Language Models: Mitigating Hallucinations

Researchers are actively exploring how to build LLMs that are less prone to hallucinations. One key area of focus is training data. By carefully curating high-quality information with diverse perspectives, researchers can minimize bias and improve factual accuracy.

Additionally, explainability techniques are being developed to allow LLMs to explain the reasoning behind outputs. This transparency can help us identify potential hallucinations.

The final frontier might be real-time fact-checking integration. Imagine LLMs equipped with built-in mechanisms to flag potentially inaccurate outputs – a crucial step towards ensuring reliable AI interactions. OpenAI’s adoption of the reinforcement learning human feedback loop (RLHF)—is already a big improvement.

Conclusion

LLM hallucinations, while a challenge, are not insurmountable. By understanding their causes, implementing detection methods, and improving training data, we can build more reliable LLMs that contribute positively to Africa's technological landscape. As AI continues to evolve, vigilance and responsible development are crucial to ensure its benefits reach their full potential. These hallucinations are a stark reminder that the digital world isn't immune to flaws. The onus is on us to ensure the data that shapes these powerful tools reflects the complexities of the real world, not the echo chambers of the past, or the inaccuracies of our present age.

And with that dearest gentle readers, and with all that’s going on right now, I’d like to remind you that you got this. And as SCA, we are proud of you. So smile, no need to hide that beautiful face. But remember, In God We Trust, In everything else we verify, I don’t see why you would not apply the same logic to your LLM answers. Don't be mesmerized by the fluency of AI-generated text. Just like with a dream that seems real until you wake up, fact-check the information presented by LLMs. After all, even the most elaborate fantasies can crumble under the light of scrutiny. That has been it from me. See you on the next one. Be Safe. Be Kind. Peace.