Click the Subscribe button to sign up for regular insights on doing AI initiatives right.
Speedwalking
Imagine you're walking along at a leisurely pace. Then, glancing at your watch, you realize you're late for something. You pick up the pace, walking briskly now.
After a while, you realize that you still won't make it on time. You accelerate. Your brisk walk now qualifies for a proper Olympic speed walk.
Still, you see that time's tight. You're already walking as fast as you can. What else can you do?
You can run.
When we find ourselves in a tight spot, we often continue in our current mode with heightened intensity.
The neural network isn't learning well. Let's add more layers.
The project is late. Let's add more people to it.
The team is misaligned with business goals. Let's have more meetings about it.
Nobody is using our app. Let's add more features.
The app is slow under load. Let's split out a few more microservices.
Or you can pull back and try an entirely different approach.
Looking For Keys Under a Street Lamp
There's that old joke about a man looking for his keys under a street lamp. A passerby asks:
"Did you actually lose them under that lamp?"
"No, I lost them somewhere over there in the dark."
"So why are you looking under the lamp?"
"Because the light's better here!"
One of the more subtle forces that push us into unnecessary complexity follows a similar logic. We're building solutions with the wrong tools because we know them better. This is the dark flipside of the otherwise sound admonition to stick with "boring" technology over the shiny new thing.
For example, when I jumped back into full-stack development after a long time mainly doing scientific programming in Python, part of me wanted to go out and look for ways to handle everything, including interactive web frontends, in Python. Such libraries exist, but, to be honest, they're a good example of looking for keys under a street light not because they're there, but because the light's better. I had to get over it and (re-)learn JavaScript.
The Wisdom to Know the Difference
Sometimes we should stick with what we know and not use the shiny new things. And sometimes we should ditch what we know and learn a new thing. But how do we know which one's right? By not going with our gut:
If you're excited about using the shiny new thing, check yourself; it might be a trap, and you'd be better off sticking with what you know.
If you dread learning that unfamiliar technology, that might tell you it's time to face your fears and go for it.
Fear of the unknown can save us from doing something dangerous, but it can also keep us stuck in a "meh, good enough" mindset. Venture out of the light and look for things in the dark places you'll find them. And before you know it, your eyes will adjust.
Have You Met This Parrot?
We found this parrot that had learned to say, "How's the project coming along?" so we hired him as a manager. 😬
Sorry for the bad joke.
But we can easily translate that joke to other professions. Maybe the parrot learned to say...
Get some rest and come back when it gets worse.
And how does that make you feel?
I will neither confirm nor deny. Next question.
And we hire it as a doctor, psychiatrist, or government spokesperson. Is your profession safe?
If you can come up with good reasons for why a parrot can't do your job (and I sure hope you can) then you can also come up with good reasons for why generative AI can't do your job. Remember, GenAI is a (statistical) parrot that learned to (statistically) say the right words at the right time. It's amazing how far AI engineers can push such a system to perform useful tasks, but it does not change the underlying fundamentals.
So, on that note, happy Friday and squaaaawk.
Skills You Hope You Never Need
Before venturing into the beautiful backcountry of British Columbia's snowy mountains, it's wise to take Avalanche Skills Training (AST). The most dramatic part of this course is the beacon search practice: How fast can you locate and dig out a buried object using your avalanche transceiver/receiver, a probe, and a shovel?
If you're out in the wilderness, your friend's life might depend on this. The probability of surviving being buried by an avalanche drops off dramatically after 10 minutes. The situation is chaotic and stressful, the snow will be hard like concrete, and you really don't have much time. So, it's better to get really good at it.
A successful rescue after a full burial makes for an exciting, maybe even heroic, story.
Yet if we're honest, if you have to use these skills, something already went horribly wrong. Did you ignore an avalanche forecast that called for high or extreme danger? Did you venture onto a slope you should have recognized (also from your AST course) to be unstable and likely to produce an avalanche? Did you ignore signs of instability all around you?
Heroics and Upstream Thinking
Dan Heath's book "Upstream: The Quest to Solve Problems Before They Happen" describes many such cases where a heroic action overshadows a quieter but ultimately more impactful action.
Catching a robber after a high-stakes car chase is cool. Doing community engagement to keep kids out of trouble in the first place is boring.
A marathon debugging session to hunt down a bug is impressive. Proceeding in small, well-tested steps appears to just slow you down.
Working weekends to make sure a software deployment goes smoothly earns you a pat on the back. Setting up CI/CD automation, monitoring, and automated rollbacks feels like a chore.
If we're being honest, we love the idea of saving the day, and because nothing is ever perfect and foolproof, there sure is a time for heroics. But maybe we can also find satisfaction in preventing untold issues from ever requiring heroics in the first place.
Fun vs Ugh
This tweet from early on in the AI hype cycle touches on an important point when it comes to using or building AI products.
Yup.
Take AI coding assistants. They do the fun part (writing code) and leave the chores (reviewing and debugging) to you.
Why Delegate to Humans
For managers or senior individual contributors, delegation is an important skill. Doing it well has many benefits beyond saving you some time: It also contributes to your report's learning and growth. Inexperienced (or plain bad) managers are reluctant to delegate because they'll tell you that it's faster if they do it themselves.
That's, of course, beside the point. Yes, you can ship that feature in half the time it would take the new hire. And if you never let them have a go at it, they'll never get up to speed. So you have to learn to delegate even the fun bits.
AI for the Boring Stuff
Not so, of course, with machines. My dishwasher is never going to improve at doing the dishes. It is, in fact, quite slow at it (90 minutes for a load. I bet I can do it faster than that!)
And that is also beside the point, because here it is purely about saving me time from something I don't enjoy doing.
When thinking about how to incorporate AI into your work or what sort of AI product to offer, look not for the fun stuff. Yes, it's exciting. But as the tweet's author states, people want to do the fun stuff themselves. Paul Graham calls it the Schlepp factor. An alternative name could be the Ugh factor. "Ugh, do I have to?" If there's an app for that, I'm sure people will pay attention.
Sometimes You Just Have to Risk Looking Silly
Imagine you're a soccer player deciding on where to aim your penalty kick. What's the best approach? Upper left? Lower right? Nope.
It's a statistical fact (which I believe I learned from one of Malcolm Gladwell's essays, but I'm not sure which one) that your best bet is to aim straight for the centre!
That's because the goalie will almost certainly jump for one of the corners, leaving them unable to get to the ball.
But it's also a statistical fact that most players do not aim straight for the centre. Why is that? Because if you kick straight and the goalie holds it (by not moving at all), you look mighty stupid.
The mere fear of looking stupid is a powerful driver that makes us pick overly complicated solutions without a higher chance of success.
Next time you're...
grabbing for that GPT-4.1 integration, ask if a finetuned smaller model (BERT?) wouldn't do the trick
setting up a neural network dozens of layers deep, see if random forests, logistic regressions, or a gradient-boost model wouldn't actually work better
spreading your app over hundred microservices, ask if a single instance and a monolithic architecture wouldn't serve you better
An old Latin (I believe...) quip advises to "better remain silent and be thought a fool than to speak and remove all doubt." Maybe it's time for an update:
"Better be thought a fool for keeping it simple than be a fool for making it complex."
Pair-Programming with Robots
Across the software industry, the business practices and the rituals of Agile have seen much heavier adoption than the technical practices. You've likely experienced first hand the likes of
Standups
Sprints
Story-point estimation
If you're lucky, you've worked in teams that practiced Continuous Integration and Continuous Deployment (CI/CD) and even done some Test-Driven Development (TDD).
Yet, the rarest breed of all the Agile technical practices has to be Pair Programming. Newsletter reader (and excellent newsletter writer over at Engineering Harmony ) Alex Jukes recently wrote a series about the practice, with the first post here.
Quick recap
Pair programming is having two programmers work next to each other (or via screen sharing) on the same task, in the same editor. One person does the typing, one person does higher-level thinking, a bit like in rally car racing with a driver and a navigator.
Wherever I had the chance to practice this, I've found it the fastest way to:
effectively onboard new team members and have them become proficient in an unfamiliar codebase
work through complicated refactorings or other code improvements
coach junior team members on better techniques and principles
Despite heaps of benefits, it never became as widespread as standups, story points, or CI/CD, which is a shame.
The Trojan Robo-Horse
Here's a hope, naive though as it may be: These days, every programmer can have their own little AI buddy as a pair programmer. That gives some of the benefits (having someone think alongside you) of pair programming, but by no means all of them. Yet, as people realize how great it is to have a companion while you code, more might be willing to try pair programming with a real human. After all, much more knowledge transfer and learning is happening in that scenario!
Over to you: Have you tried pair programming, and how did that experience go?
Frontend, Backend, and the Reverse Conway Maneuver
Conway's Law states that,
Organizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.
You have a giant single team working all on the same system with no groupings and hierarchy? You're bound to get a monolith (or, if you try to force microservices into the mix, a distributed monolith).
You have one small satellite office, in a different time zone, not included in regular meetings, and they're working on feature X? Guess which feature will be poorly integrated into the overall system.
By far, the biggest example of this law in action is the split of developers into front-end and back-end: those who work on the user interface and those who work on databases and APIs. By segregating them into entirely different career paths, we (the software industry) have inadvertently created a situation where the front- and back-end exist in this weird state of "technically separated but, by necessity, strongly coupled".
My own experience of writing full stack web apps tells me that this is an anti-pattern. It's a delight being able to deliver a single feature, end-to-end, and not have to worry about how it splits across front- and back-end.
Modularization is great to tame complexity. Divide and conquer! But that only works if the divided entities don't need to actually be coupled quite a bit.
That's how we end up with ad-hoc API endpoints that serve no general purpose other than "well this dashboard needs those three fields from the database", or with duplication of client- and server-side logic (because god forbid the two talk to each other).
And when developing with AI, the traditional approach to splitting responsibilities horizontally (frontend/backend/AI) instead of vertically (i.e. split along actual features) will make everything even messier, because now you have three layers that need to tightly coordinate.
Reverse!
Here's a cool trick, humorously dubbed the Reverse Conway Maneuver: Once you determine the correct structure for your system, you can set up your organization's structure to support that system! And if we agree that we want to have well-built cohesive end-to-end features rather than a disjointed mess, we'll arrive naturally at small, focused, product-oriented teams.
Just like Agile (way back in the day) was trying to tell us all along.
Interpolation versus Extrapolation
Here's one reason large language models (LLMs) like ChatGPT are so great at certain tasks and so poor at others: The underlying machine learning model is explicitly trained to predict the next words, or tokens, that match the training set. Sure, there are some tricks and tweaks you can apply, but in the end, that's what you get.
No Cubism For You
Imagine a generative model trained on all art up to the mid-1800s. It would not spontaneously generate an image in the style of Picasso. Cubist paintings are not in the training distribution and you can't get there in a simple interpolation.
The same holds for text, though maybe here the concept of style is less stark. But simply put, you won't get novel styles, novel concepts, and novel ideas if they are not interpolations of existing ones: If you trained it exclusively on the works of Shakespeare, you'd have a hard time getting it to produce the sparse minimalism of Hemingway. I doubt even heavy-handed prompting would get you from Romeo and Juliet to The Old Man and the Sea.
Generative models are, for better or worse, bound by the constraints of their training data. The next breakthrough, whether in art, science, or language, won’t arise from prediction alone. True creativity still requires stepping beyond what's already known.
PS: Of course the actual ChatGPT knows both Shakespeare and Hemingway...
He saw the window. There was a light. It was Juliet. Juliet was like the sun in the morning. Bright and strong. Better than the moon. The moon was pale and jealous. Juliet was not.
The One Weird Trick That Guarantees a Hallucination
I saw someone on LinkedIn claiming they had reverse-engineered ChatGPT's latest memory feature through some clever prompting:
They asked a question that relied on ChatGPT's memory and got an answer
Then they asked ChatGPT to explain how it arrived at the answer
They claimed that this explanation would reveal the secret mechanisms of the feature.
The only problem is, it's all made up.
The only truthful answer ChatGPT could possibly give to such a question would be: "Because the input and my system prompt lead to these tokens to have the highest probability to get selected as the next tokens" or something to that extent.
Any interrogation of a large language model is bound to lead to hallucinations. An easy way to experience that first hand is to see the model occasionally change its mind when asked to explain its reasoning.
Additionally, we can ask why, if an LLM knew how it arrived at conclusions, it would ever hallucinate in the first place.
There's no reflection, concept of self, or actual thinking under the hood. All there is is the next best token.
AI-Coding: The Ultimate Leaky Abstraction
I am skeptical of exaggerated hype around vibe coding. One question bothered me, though: Am I just turning into the old curmudgeon yelling at the young kids to get off his lawn? After all, similar debates existed before:
There was a time when programming meant punching holes into cards. Then along came these fancy folks with their keyboards.
Then we had people who complained that, with a compiler, programmers these days weren't reading and writing pure machine instructions (assembly language) anymore.
These compiler users then complained when some people moved to interpreted languages that don't even need to be compiled.
The list goes on and on, and of course, the old guard proclaimed that the new way was wrong each time. When confronted with this litany of outdated complaints, the curmudgeons quickly insist that "it's different this time!"
It Is Different This Time
So what's the difference between someone using a compiler, blissfully unaware of the underlying machine instructions, and someone vibe coding their app, blissfully unaware of the underlying Python or JavaScript?
It's that the abstraction is incredibly leaky.
In more than 99% of situations, using a compiled language like C, C++, or Rust, you can happily ignore that it ends up as machine language. You can spend your entire programming career writing highly effective code in these languages and not once have to drop down to the lower level of CPU instructions.
We are still far from achieving a similar scenario with AI-generated software! As your project grows beyond the very first prototype, you'll find that the AI runs into dead ends, or, that it breaks one thing when fixing another, and all sorts of hiccups that you can only solve by looking at the code and devising a solution. You have to drop down to the level of the source code, and there's currently no way around it.
It might change in the future, but I wouldn't just yet fire my entire software engineering department to have my project managers vibe-code everything themselves 🤷♂️.
Leaky Abstractions: AI Edition
Last time, I brought up the concept of leaky abstractions.
And the relevant example, these days, is AI systems, particularly language models.
The message is: When building with AI, you cannot neatly encapsulate the lower-level details and hide them behind a nice interface.
Letter Counting
A little while ago, this observation made the rounds on the internet: If you asked ChatGPT how many R's there were in the word Strawberry, it would confidently tell you there were two Rs. ChatGPT has gotten a bit better at counting letters, but even as of this writing, it will make mistakes:
I’m sorry, ChatGPT, but I cannot acept this.
The reason for this struggle with letter-level word processing is well understood and not the topic of this post (the keyword for the inquisitive reader is tokenization).
What matters is that you cannot just assume that an LLM is a black box that "understands" language. If you want to build an LLM where letter-level accuracy matters, maybe as an assistant to help you with your daily WORDLE puzzle, you'll run into inscrutable problems.
Substitutability
One sign of a good abstraction is that it lets you seamlessly swap out the inner implementation without noticeable changes to the interface. Language models fare quite poorly here: Some are great at coding, some are better at generating prose or marketing copy. Some will readily return well-structured data according to a requested schema, others will insist on injecting their own quirks. If you build an AI-powered product, you cannot, on a whim, throw out ChatGPT and plug in Claude without re-engineering all your prompts.
Hallucinations and Reasoning
When building with AI, it matters a great deal how language models generate their outputs. Understanding this lets you appreciate under what conditions the model is likely to hallucinate more. Then you can devise some mitigation plans. This understanding also helps with appreciating the limitations of "reasoning" models.
I hope to touch on a number of these subtopics in further posts. For now, I'll be enjoy a long Easter weekend.
Leaky Abstractions
A popular way of dealing with complexity in software is to encapsulate lower-level functionality behind layers of abstraction:
A file is a useful abstraction over the bits stored on a hard drive.
An HTTP Request is a great abstraction over bits zooming along the internet
In the physical world, your TV's remote is a great abstraction over sending channels or volume commands as infrared signals
A perfect abstraction hides all its internal details from the outside. It only exposes an interface at a higher level of abstraction: Operating a remote control involves pressing buttons, not concerning oneself with the operational characteristics of an infra-red LED.
In practice, there's always some leakage. The TV remote is all about buttons, but the physical limitations of infrared beams become apparent as soon as one of your kids is blocking the path between the remote and the TV.
In cases of mild leakage, we can often lift the leaked detail from the lower to the higher level of abstraction. Without dwelling on the details of electromagnetic radiation, the interface for the TV remote is:
Press buttons to tell the TV what you want
Make sure there's a clear path from remote to TV
Unfortunately, there are situations where the lower-level details leak out in a way that can't be incorporated into the higher level without deeply understanding the lower level.
Which brings us, of course, to AI systems in general and large language models in particular. More on that tomorrow!
An AI Feature Done Right
Given my rants about AI done wrong, unnecessary complexity, or folks being snobbish about "simple" features, I thought I'd give a shout-out to Todoist (still my favourite to-do app).
They've slowly but steadily shipped some nice AI features. The recent one (currently in beta collecting feedback) is as genius as it's simple:
As a user, your task inbox has a unique email address (some-unique-code@todoist.com).
If you send an email to that address, it gets added to your task list.
The new feature: AI reads the email, extracts the actual tasks and their due dates, and adds those to your task list.
The Prompt is Not Your Moat
If your to-do app doesn't have this feature, you can copy and paste an email straight into ChatGPT's interface, ask it to extract all action items from it, and then copy those into your app. I suspect that landing on the right prompt to achieve this wouldn't be too hard.
Or, if you're feeling fancy, you can set that up as an automation in Zapier and co.
Or, we realize that AI represents raw functionality that provides the most value when integrated into a holistic product experience. The question is not, "Can straight-up ChatGPT already do it?" The question is, "How can we improve our users' experience?"
The Stakes Are High
Taking our last few posts (on AI, Iterations, Product/Market Risk, Evals) together, there's an important observation:
The stakes are higher to do things right from the beginning:
Evaluating an AI system is harder than testing a traditional software system.
With AI, it's less certain that what you want to build is even feasible than with traditional software.
Especially when doing your own model training, feedback loops are longer. You're more runout.
It's well known from traditional software that when testing is slow and painful, it's not done as much as it should. It's also known that, when feedback loops are long, there's higher risk of building something that doesn't meet the user's needs.
So, going back to our principles, successful AI Engineering requires strong discipline and a will to uphold these principles. The temptation will be strong to forge ahead without evals, without trying hard to shorten feedback loops and without keeping the design simple. But resist we must so that we don't design ourselves into dead ends and so we can deliver real value for the long term.
Eval’s where it’s at
Somewhat drowned out by all the chatter and hype about vibe coding, I'm glad to see that a much more important conversation is picking up steam.
Five months ago already, Eugene Yan, a Principal Applied Scientist at Amazon, shared this insight on LinkedIn:
Evaluating LLM output is hard. For many teams, it's the bottleneck to scaling AI-powered product.
That topic didn't get much attention in the following months, but now, at least, I've seen engineers and product managers discuss it much more.
Evals tie in nicely to yesterday's post about tight feedback loops. If you cannot quickly and automatically evaluate your AI models, you're in long-runout territory where feedback comes too late for comfort.
In a way, evals are to AI what unit tests are to traditional software. They're the written-down assumptions and constraints we want to put on our system and allow us to check whether a change we're contemplating moves us toward or away from our goals.
Of course, they are very different in another way. Unit tests are deterministic and check a deterministic path through the code: "Given this input, verify that the output is such and such." The evaluation of an LLM cannot be expressed like that, and it gets even more complex when we want to evaluate the effectiveness of an AI agent.
LLMs to their rescue
So what are we to do? Imagine building a special AI tool to turn the abstract of a scientific paper into an eye-catching blurb for LinkedIn to help your research department's social media team. The output will be different each time, and you can't hard-code an evaluation that replies with "good" or "bad."
But you can create another LLM to use as the judge, or, possibly, two individual judges:
One judge will check that the social media blurb does not hallucinate.
Another judge will check that the blurb is catchy.
It's easiest to issue a Pass/Fail here, but of course, you can ask the LLM for more nuance.
For more complex tasks, imagine having several judges, each focusing on one aspect. The advantage of this is that each one will have a less complex prompt.
Once in place, you can start iterating on the AI system you're building by grabbing a set of test abstracts and experimenting.
Does the eval improve if you change the underlying model from ChatGPT 4.5 to Claude 3.7?
What about tweaking the prompt?
Have we hit diminishing returns and it's time to fine-tune the base model?
Tying these decisions and the experimentation with models and prompts to concrete evals is the essential piece that turns the black art of AI whispering back into an engineering discipline.
Short Iterations and Rock Climbing
When programming, I get a strong feeling of unease when the spacing between iterations, that is, times between receiving feedback, grows too large.
That can be on a small scale, like when there is too much time between writing small unit tests and making them pass or too many lines of code changed in a single refactoring step. Or it can be on a larger scale, when it's been too long since I bundled up my code changes and submitted them for code review, and on the even longer scale, where it's too long between taking stock of where we are, where we want to go, and what the gap is between the two.
It's very similar to a feeling I'd get rock climbing.
As you embark on a roped-up climb, you rely either on pre-placed bolts or self-placed pieces of protection to keep you safe. If you have just clipped into such a piece, you'll feel quite comfortable and relieved: Should you slip right now, you won't fall far. But then you have to climb on. The further you go, the longer a potential fall becomes and the more nervous you get, longing desperately for the next bolt or next good protective placement.
In climbing, it's not practical to place too much protection. You'd either ruin the rock face by sewing it up with pre-drilled bolts, or you'd weigh yourself down with an unreasonable amount of protective gear and waste time and energy placing it. Instead, your good judgment is required: Match the distance between placements (the so-called runout) to the section's difficulty while keeping consequences in mind:
Easy climbing ahead and, should you fall now, you'd fall far but only hit air? Forge on.
Climbing near your limit, or potentially hitting a ledge, corner, or protrusion should you fall? Place more conservatively.
Same in software development. It doesn't hurt to know how to go super slow and write extremely tight micro-tests, but maybe a simple function of the type you've made hundreds of already can do without. On the other hand, unfamiliar terrain or changes that touch critical parts of the system benefit from slowing down and keeping the runout short.
What Does Good Design Look Like?
Several previous articles have discussed design, whether for products, machine-learning models, code, or architecture.
What we haven't done yet is define what Good Design even means.
Simply put, your system's design is the current arrangement of its components. What, then, makes such an arrangement good?
Properties of a Good Design
The overall system must serve its purpose. If it doesn't, it's designed poorly.
It must be comprehensible. The reason things are arranged the way they are should be evident in light of the system's purpose.
It must be easy to change.
The first two points are about the present. The last point is about the future. You spend much more time changing any relevant piece of software than on its initial creation, maybe because requirements change or enthusiastic customers demand new features.
It's important to stress that this doesn't mean that your system, in its current state, must anticipate all possible future requirements. That leads to bloated, overdesigned systems. Remember YAGNI: You Aren't Gonna Need It.
This requirement means that once a particular future requirement becomes present, the effort to change the system is reasonable: You don't have to rewrite the whole app or touch dozens of files across the codebase just because of a small tweak.
Several good practices fall immediately out of this requirement:
Your system must be easy to test. That way, you can make changes without fretting that you might accidentally break an unrelated part.
The system should be decomposed into units with clear and single responsibilities so that it's immediately clear where a change must be made.
Dependencies between these units should be managed and minimized so that changes to one part of the system don't ripple through the entire system.
Similarly, every "fact" about the system should have one unique source of truth, so you can't accidentally change only one of two sources for the same fact.
While many of these ideas come directly from software design, you can also lift them up to higher levels. The same is true for architecture (e.g., make sure your microservices architecture isn't a distributed ball of spaghetti) and even your company communications (small functional units with minimal dependencies between them for high organizational velocity.)
Happy designing!
SaaS Creep - Why Does Zoom Think I Want Their Task Feature?
Evernote was a simple yet beloved note-taking tool. Then, they added tasks, calendars, and other features that made the tool so bloated that users got fed up and left.
Dropbox was a simple, fantastically ergonomic cloud storage tool. Then they added a Password Manager and a Google Docs clone, and users were angry that they had to pay for stuff they didn't want.
Zoom recently added Docs, Tasks, and Workflow Automation. We've yet to see if there'll be a fallout, but I'm sure people have had plenty of good ways to manage their docs, tasks, and automation already.
Every successful SaaS Company seems to try to become the One Stop Shop, much to the detriment of the user experience. It would be like if a successful Steak House added Pizza and Sushi to their menu. For some strange reason, these companies see it as their mission to prevent users from ever switching tabs. Oh no, don't check your Google Calendar. We have a calendar right here in Zoom! Don't check your tasks in Todoist, we have tasks in Zoom now!
Is it pressure from investors and shareholders to grow? Maybe. It just doesn't work that way. Here's the rub.
Let's stick with Zoom. Their current customers like its video call capabilities more than they like Google Meet's video call capabilities.
But there's no reason to assume that all of these customers would like Zoom's take on task management, collaborative documents, or workflow automation, more than they'd like whatever they're currently using. Unless the new feature has incredible synergy with your core product, people will ignore your feature. It won't be the reason people pick Zoom over Meet, it won't be the reason people switch to Zoom from Teams, and all the while, the current users will suffer the bloat and the diversion of resources from the core experience they care about.
And now that software developers are allegedly 10x more productive with AI, we can expect 10x more features that nobody will care about...
It may be time to recall the old UNIX philosophy: Do one thing well!
Prompt Engineering - Not here to last
In the early days of ChatGPT, social media was full of people sharing their tips and tricks for adequately prompting it to get the desired outcome. Some of those proved genuinely useful like Chain-of-Thought prompting (now baked directly into various "reasoning" models.) Others bordered on the superstitious, like an alchemist's magical incantations. Ideas like offering the LLM a tip for a job well done or threatening it with punishment.
These prompt techniques appeared so esoteric and beyond mortal comprehension that soon, there was news of Prompt Engineers earning USD 400,000/year. Now, I wonder if much actual engineering was involved, or if it was more a case of splatterprompting, throwing prompts at the model to see what sticks.
There are two issues with prompt engineering:
It is inherently brittle. The longer and more convoluted the prompt, the less likely it will survive a model upgrade and the more likely it will sidetrack the model.
Consumers, in particular, don't want to bother with it. They want to tell the AI to answer a question, not obsess over the intricacies and inner workings of the underlying model. As long as we need to prompt the AI just right, we can't call it natural language understanding.
As it turns out, newer and better models do much better with actual natural language. Now, for internal use of LLMs via an API, there'll always be a need to engineer the prompts to get the desired output with near-100% probability. The emphasis here is on engineering, with all the important principles we talk about on this newsletter:
Start simple. The shortest prompt that expresses the problem.
Add complexity one small step at a time.
Test and evaluate after each step.
Happy (non-)prompting!