On AI Agents Deleting Your Databases
Always a bit late to the social media hype, because I like to sit with a story like this for a bit:
Some SaaS company (PocketOS) was letting an AI agent handle their infrastructure. It got tasked with optimizing cloud costs, so it decided the best way to save money on the cloud was to delete the whole database (plus backups).
The innumerable and gleeful hot takes around this get a few things right and a lot of things wrong.
First, the more sensational pieces mention that, after the deed was done, "Claude wrote a confession." I've written about this misconception before: Asking an AI why I did or said something it just did or said is guaranteed to give you a hallucination. There's no actual intent. Claude didn't willfully ignore instructions that told it to not take any destructive action. Those instructions just didn't factor strongly enough into the next-token-prediction engine the internal language model used to write the plan of action.
Second, people say that, "Well, interns have accidentally deleted data before, so this is just about proper permissions". It's true that interns or newcomers to a complex system occasionally mess things up, and that well-designed systems make it harder for such whoopsies to occur. But there are qualitative differences. Human errors occur either through mechanical failures such as simple typos ( rm -rf / versus rm -rf ./. If you know, you know 😉) or because the human's world model, their internal concept of how the thing worked, is misaligned: What they wanted to accomplish was sound, but their means were flawed. AI agents, lacking an internal world model or, more flippantly, common sense, will do all sorts of silly, unintentional, or destructive things that, technically, align with the stated goal: It's very true that deleting the entire production database saves a lot on cloud cost.
There's work to do in figuring out how to best work with AI. If you insist that everything AI-produced needs to be human-vetted, you won't see strong productivity gains. But if you hand the reins over completely, any such gains can get wiped out by catastrophic events like this.
One-way and two-way doors
Anything that's easy to revert should be save for an AI agent to touch. Bonus points if any issues get detected automatically so that rollback happens without human intervention: Had AI autonomously fix a bug and push to production but now we're seeing server errors (that we couldn't have caught with local testing before)? Roll it back, all good.
Anything that's impossible or hard to revert or would have catastrophic consequences should be gated behind human review. Deleting columns on a database? Modifying infrastructure? AI shouldn't make that call on its own.
This is also how highly effective organizations hand more power to their employees, by allowing them to make reversible decisions without having to first go through their manager. Yet again, what's good for humans is good for AI.
