"The Computer Doesn't Do What I Tell It To!"
As a teenager, I was often called on to provide basic tech support for friends and family. They'd complain that the computer wouldn't "listen" to them. I'd chuckle at that because the problem likely arose because the computer did do exactly as told. Barring outright bugs, old-school software is deterministic. Clicks and keystrokes have pre-programmed results you can rely on.
Not so with generative AI. Non-determinism lurks everywhere.
There is the inherent randomness of the generated output. The typical chat models answer differently for the same repeated question. When accessing a model programmatically, this randomness can be turned off, but other sources of nondeterminism remain.
In cases where the model, even with the randomness set to zero, is fed raw user input, even slight variations in the input can lead to different outcomes: Whether the user wrote "Can you write me a poem about cats?" or did they write, "Make up a poem about cats!" might lead to very different outcomes.
And then, there is of course the unpredictable way the model handles a large input, or a complex request. You might ask for a piece of writing in a particular format and it's anyone's guess whether you get it or not. So in that case, the computer really didn't do what you told it to.
What can AI engineers do here?
Aim narrow, accept wide. That's general good advice for a user experience. The more you have to tell your users exactly how to use the system, the less awesome they'll feel about themselves and the experience. On the flipside, the more you can make the tool's output predictable, or at least non-perplexing, the better.
Safeguards around user input and post-processing of model output, to file off the rough edges. This could be traditional preprocessing (like removing all special characters and punctuation from an input where it shouldn't influence the actual output) or it could be yet another AI step.
Tune the model's randomness. Don't just accept the default parameter. It might not be the best. Neither might setting it to zero be best. Experiment and evaluate.
And if all that doesn't help, just put "vibe" in front of your product name and all is forgiven ;)