Engineering with Tolerance
Engineers dealing with the physical world have always worked with tolerances. The diameter of a screw, the length of a steel beam, the conductivity of copper wire—none are ever exact. Instead, manufacturers quote values as "5mm ± 1%." It's then the engineer's job to design systems that function despite inexact inputs.
In traditional software, we don't worry about tolerances. Sure, there are floating point issues (0.1 + 0.2 = 0.30000004), but generally you issue commands and the computer executes them to the letter. Even with traditional machine learning, a trained model is deterministic. The outcome will be in exactly the format you desire. With generative models, that all goes out the window.
An example:
Traditional spam filter: Train a classifier to read an email and output 0 or 1. Zero means "not spam," one means "spam."
LLM spam filter: Send the email to the model with a prompt: "Have a look at the following email and tell me whether it's spam. Answer with a single word—'yes' if spam, 'no' if not."
Here's the tradeoff between determinism and expressive power:
The traditional filter always produces correctly formatted output. 0 or 1, nothing else. Whether those labels are accurate is a separate question, but plugging them into the larger email service is trivial.
The LLM might be far more sensible in its classification, with stronger understanding of semantics. But there's a non-trivial chance the output is neither "yes" nor "no" but something like "Yes, that definitely looks like spam," or worse: "No. Would you like me to summarize the email?"
LLMs have gotten better at following instructions, but there's currently no way to hardcode formal requirements. They're guidelines. So what do we do?
Post-process: If an answer doesn't fit your schema, add a cheap post-processing step. In the example above, look for "yes" or "no" and ignore the rest.
Use tool calls for the final answer: Instead of having the spam-filter LLM spit out the answer directly, tell it to call a tool with either
markSpamormarkNotSpam. It's a bit of a hack, but current LLMs are strongly optimized for tool calling—more than following vague free-form instructions.Accept wide, output narrow: Design systems that accept a range of input formats but follow a narrowly defined output schema.
Physical engineers learned this a century ago: design for the tolerance, not the spec
