May 12, 2026
# The 4-Question Cold-Start Test
I had my worst Monday in months last week. Six different Claude instances running parallel tasks. All of them producing confident garbage. By noon I was rewriting everything myself, wondering why I'd bothered with the automation at all.
The pattern was clear once I stopped being annoyed. Every failed task had the same root cause: I'd told the AI what to do but not how to think about doing it. Like sending someone to a poker game with "play cards" as their only instruction.
So I built a gate. Four questions that have to pass before any task goes to an agent or teammate. They're obvious once you see them, but I'd been skipping the fourth one for years.
## Can a stranger know what this is?
This one seems basic until you watch yourself fail it. "Update the dashboard" means nothing. "Add the three revenue metrics from the Q4 spreadsheet to the executive dashboard in Tableau" means something.
I caught myself writing "fix the integration" last Tuesday. Fix what about it? The timeout errors? The duplicate records? The fact that it runs at 3am and wakes up the on-call engineer?
You know you've passed Q1 when someone with zero context can read your task and picture the actual work. Not the category of work, not the general area. The specific thing.
## Can a stranger know why we're doing it?
"Because the CEO asked" isn't a why. Neither is "for the quarterly review." Those are triggers, not reasons.
Real whys look like: "The sales team can't see which deals closed yesterday without manually checking three systems. This wastes 20 minutes every morning for eight people."
I started forcing myself to write the why in terms of what breaks if we don't do this. Turns out half my task list didn't have real answers to that question. Those tasks disappeared.
## Can a stranger know what they're supposed to DO?
This is where most people think they're done. You've explained what needs fixing and why it matters. You've listed the steps. Ship it.
But watch what happens. You tell someone to "attend the conference and network." They go. They attend sessions. They eat the rubber chicken lunch. They come back with a stack of vendor brochures and no useful connections.
Of course they do. You told them the what but not the how. Mechanically, what does "network" mean? Stand by the coffee? Join conversations? With whom? About what?
## Can a stranger know how to WIN at it once they're there?
This is Q4. The one everyone skips. The one that matters most.
"Network at the conference" is Q3. But Q4 sounds like: "Find the three smallest companies exhibiting. Their founders will be manning the booths personally, bored out of their minds. Tell them you're building X and ask what their customers complain about. Get their cell numbers. Text them Thursday night asking if they want to grab drinks away from the conference nonsense."
Q4 is the difference between activity and results. It's the metadata about how to play the game once you're in it.
Another example: "Review the pull request" vs "Check if this PR introduces any new database queries in loops. The author tends to write N+1 queries without realizing it. Also verify the error handling actually logs somewhere we'll see it, not just console.log."
One more: "Interview the candidate" vs "This person will be our only DevOps hire. Push hard on their debugging process. Ask them to walk through their last major outage. If they blame others or talk about process instead of technical details, pass."
## The compound interest of bad specs
Here's what I learned rewriting all that Claude output: AI agents guess when you leave gaps. But they guess with complete confidence. And each guess becomes the foundation for the next decision.
You write "improve the landing page." The AI guesses you mean conversion rate. It guesses that means more buttons. It guesses those buttons should be green. Three hours later you have a Christmas tree where your landing page used to be.
My worst failure: I asked Claude to "analyze our customer feedback and find patterns." Came back with beautiful charts about sentiment analysis and word frequency. Useless. What I needed: "Read these 50 support tickets. Find the three features people complain about most. For each one, paste three exact quotes showing how customers describe the problem in their own words."
The difference? The second version passes all four questions. A stranger knows exactly what to deliver and how to recognize success when they see it.
## The Monday morning test
I've started applying this to everything now. Email requests. Slack messages. Even my own task list. The discipline is simple: before hitting send, run the four questions. If any fail, rewrite.
Takes an extra two minutes. Saves hours of clarification and rework.
Try it tomorrow. Pick your highest-stakes handoff. Could someone with no context execute it without asking you a single question? Not just execute it, but execute it the way you'd execute it?
That's the real test. Not whether they can complete the task. Whether they can complete it with your judgment embedded in the spec. Because that's what we're really delegating: not just work, but the thinking about the work.