Why a Chatbot Can't Tell You How Certain It Is
An image-classification AI can not only tell you whether a picture is a dog, a cat, or a boat, it can also provide a sense of how certain it is about that assessment. So why can't ChatGPT tell you how sure it is about an answer? (I mean, it will tell you something if you ask it "how sure are you" but that is guaranteed to be a hallucination.)
The answer lies in the way a chatbot produces its output. First, for the image classification AI. It takes an image and puts it through a series of mathematical operations. The output is a list of numbers, one for each category the AI is trained to recognize. These numbers express directly how convinced the image classifier is that the image belongs to that category. If it gives 0.99 to "dog", we say its 99% sure that the image is of a dog.
But when you put a question to a chatbot powered by a large language model, there is no such list for the facts of the answer. Concrete example: You ask ChatGPT, "What is the capital of France?"
What's not happening is that the AI computes a list of probabilities for all the cities of the world to be the capital of France, finds a 100% score for Paris and then builds an answer that communicates that fact.
Instead, recall that the underlying large language model (LLM) is trained to predict the next word (or token, i.e., part of a word) to follow a given sequence. The software running the chatbot takes the user question and puts the following sequence to the LLM:
User: What is the capital of France?
System:
For this input, the LLM now computes a list of probabilities. But not for facts. Just for the next word. So it would come up with something like this (very simplified):
Paris: 0.9
The: 0.1
Does that mean it's only 90% sure that Paris is the right answer? Not so. Because after "The", we feed the whole sequence back to the LLM to get the next word, and so on, eventually getting to The capital of France is Paris.
To figure out how certain the LLM really is that Paris is the capital of France, we'd have to follow every rabbit hole of every possible next word, check how many of them express the idea that Paris is the capital (in whichever verbose way) and sum up all the associated probabilities. That's because an LLM's entire "knowledge" is implicitly encoded in how it will choose to complete a sentence. No explicit body of facts exists in it.
This is important because the whole way a chatbot communicates can fool us into thinking that there's a smart, educated entity on the other end. But no matter how impressive and useful they get, they're really just predicting one word after the other in a way that fits what they've seen.
