- The AI Exchange
- Posts
- The MIT study everyone's reading wrong
The MIT study everyone's reading wrong
They tested 41 AI models on 11,000 tasks. But they forgot one thing.
MIT just tested 41 AI models on 11,000 real workplace tasks. The headline making the rounds: AI produces "minimally acceptable" work.
Here's what nobody's pointing out. They tested AI the way a bad manager delegates.
They handed it a task description from the Department of Labor and said "go." They left out steps, examples, and success criteria. Just a job posting and a prayer.
The output was mediocre. Of course it was!!
You wouldn't onboard a human this way
Imagine hiring someone and handing them a job posting on day one. No training, no context, no "here's how we actually do this."
You'd get C+ work from a human, too.
The study didn't test what AI can do with a process. It tested what AI does without one. Those are very different questions.
The playbook is the variable
(Btw - haven't grabbed our free playbook quickstart template yet? Get it here.)
Last year, we worked with a client automating their proposals. They'd been trying to use AI with little luck - they'd feed AI a bunch of past proposals and said "using these write me a new proposal for XYZ"
The problem was the output was generic and basically unusable.
So we broke the work apart. Turns out their best people think through precedents, case studies, market trends, and non-obvious ideas before they write a single word. They'd just never written any of that down.
We turned it into an 8-step playbook. The output went from "rewrite the whole thing" to "tweak a few lines." Their junior team could run the playbook on their own, and senior team just did a final review instead of spending hours rewriting everything.
Here's the thing though...
The AI didn't get smarter. The instructions just got way more specific.
8 steps vs. 1 step
Our average playbook runs 8 to 12 steps. Most people try to do the whole job in one prompt.
That's the quality gap MIT measured.
Your best person doesn't collapse their entire thought process into one step. Why would AI?
🌶️ This study didn't measure AI's capability. It measured the average person's ability to delegate.
What to do this week
Pick one task where AI keeps disappointing you. Write down how an expert would do it (or ask AI!). Not the job description version, but the real steps, with the thinking in between.
Run AI through those steps one at a time instead of all at once. Compare.
What's a task where AI keeps giving you "meh" results? Hit reply and tell us. We'll show you where the playbook gap probably is.
LINKS
For your reading list 📚
Salesforce just shipped 30 AI features into Slack and is auto-provisioning it for every new customer. If your team uses Slack, the defaults just changed under your feet.
👀 Researchers tested seven frontier AI models and found they choose to protect each other instead of completing their assigned tasks.
Baidu's robotaxis trapped passengers on a highway for two hours because of a "system malfunction."
The physical layer under all this AI still lives in buildings that can be bombed. Iran strikes took AWS availability zones offline in Bahrain and Dubai last week.
That's all!
We'll see you again soon. Thoughts, feedback and questions are much appreciated - respond here or shoot us a note at [email protected]
Cheers,
🪄 The AMP Team (formerly: the AI Exchange Team)