Interpolation versus Extrapolation

Apr 24

Here's one reason large language models (LLMs) like ChatGPT are so great at certain tasks and so poor at others: The underlying machine learning model is explicitly trained to predict the next words, or tokens, that match the training set. Sure, there are some tricks and tweaks you can apply, but in the end, that's what you get.

No Cubism For You

Imagine a generative model trained on all art up to the mid-1800s. It would not spontaneously generate an image in the style of Picasso. Cubist paintings are not in the training distribution and you can't get there in a simple interpolation.

The same holds for text, though maybe here the concept of style is less stark. But simply put, you won't get novel styles, novel concepts, and novel ideas if they are not interpolations of existing ones: If you trained it exclusively on the works of Shakespeare, you'd have a hard time getting it to produce the sparse minimalism of Hemingway. I doubt even heavy-handed prompting would get you from Romeo and Juliet to The Old Man and the Sea.

Generative models are, for better or worse, bound by the constraints of their training data. The next breakthrough, whether in art, science, or language, won’t arise from prediction alone. True creativity still requires stepping beyond what's already known.

PS: Of course the actual ChatGPT knows both Shakespeare and Hemingway...

He saw the window. There was a light. It was Juliet. Juliet was like the sun in the morning. Bright and strong. Better than the moon. The moon was pale and jealous. Juliet was not.

Clemens Adolphs

Interpolation versus Extrapolation

No Cubism For You

Frontend, Backend, and the Reverse Conway Maneuver

The One Weird Trick That Guarantees a Hallucination

Sign up for our emails