The One Weird Trick That Guarantees a Hallucination

I saw someone on LinkedIn claiming they had reverse-engineered ChatGPT's latest memory feature through some clever prompting:

  • They asked a question that relied on ChatGPT's memory and got an answer

  • Then they asked ChatGPT to explain how it arrived at the answer

They claimed that this explanation would reveal the secret mechanisms of the feature.

The only problem is, it's all made up.

The only truthful answer ChatGPT could possibly give to such a question would be: "Because the input and my system prompt lead to these tokens to have the highest probability to get selected as the next tokens" or something to that extent.

Any interrogation of a large language model is bound to lead to hallucinations. An easy way to experience that first hand is to see the model occasionally change its mind when asked to explain its reasoning.

Additionally, we can ask why, if an LLM knew how it arrived at conclusions, it would ever hallucinate in the first place.

There's no reflection, concept of self, or actual thinking under the hood. All there is is the next best token.

Previous
Previous

Interpolation versus Extrapolation

Next
Next

AI-Coding: The Ultimate Leaky Abstraction