The prevailing direction of AI, across nearly every domain, is to replace human thinking. AI drafts the email. AI summarises the document. AI writes the first version of the code, the report, the analysis. In each case, the human's role shifts from doing the cognitive work to reviewing the output of a system that did it for them. The aggregate effect is a quiet erosion: people think less, because less thinking is required of them. It is the path of least resistance, and it is where almost all commercial AI development points.
Learning is the counter-case. It is the one domain where replacing human thinking is actively destructive, because the entire point is to develop the capacity to think. A student who offloads reasoning to AI has not learned. They have avoided learning. This makes learning experience design the most interesting problem in applied AI: how do you use a system that is very good at generating answers to instead develop someone's ability to generate their own?
The bottleneck, it turns out, has never been access to information or even access to answers. It has been access to someone who can tell you, quickly and precisely, where your thinking went wrong.
The best learning research of the last forty years keeps finding the same thing. Bloom's two-sigma study showed that students receiving one-on-one tutoring performed two standard deviations above students in conventional classrooms. The effect was so large that Bloom himself called it a problem: if tutoring works this well, why can't we scale it? The answer has always been economics. Expert tutors are expensive, their time is finite, and no institution can afford one per student.
AI changes the economics of this, because it can replicate the mechanism that makes tutoring work: the speed of the feedback loop.
What tutoring actually does
The reason one-on-one tutoring produces such outsized results has nothing to do with the information being transferred. The tutor and the textbook contain the same knowledge. The difference is responsiveness. A tutor watches a student work through a problem, notices the exact moment the reasoning diverges, and intervenes with a question targeted at that specific gap. The feedback is immediate, specific to what the student just did wrong, and adapted to where they currently are in their understanding.
In a conventional classroom, this loop runs slowly. A student answers a question, submits an assignment, waits days or weeks for grading, receives a mark and perhaps a comment, and may or may not revisit the material. The gap between the error and the correction is where learning goes to die. By the time the feedback arrives, the student has moved on, the context has dissolved, and the misconception has had time to harden into something they believe they understand.
Compressing this loop is the single highest-leverage intervention in education. Everything else is secondary to the speed and precision of feedback on reasoning errors.
The replacement default
A law firm deploys AI to draft contract clauses. The junior associate who would have spent three years learning to draft, by getting it wrong and having a partner redline it, now reviews AI output instead. The firm's throughput increases. The associate's development stalls. Five years later the firm needs senior lawyers with judgment and finds it has produced a cohort that spent its formative years editing rather than reasoning.
The same thing is happening in medicine, in engineering, in policy work. Give a junior clinician an AI that presents the diagnosis and you train a reviewer, not a diagnostician. Give an engineer a tool that writes the function and they learn to read solutions, not construct them. Wherever AI generates the first draft, the human who would have learned by generating it loses the repetitions that build expertise.
None of this shows up in quarterly metrics. It surfaces years later, when organisations need people who can reason through novel situations and discover they have optimised that capacity out of existence.
The inversion
The alternative runs the other way. Humans generate reasoning. AI evaluates it.
This is harder to build. Generating text is cheap: a language model and a prompt. Evaluating whether someone's reasoning is actually sound requires a framework for what good reasoning looks like in a specific domain, and that framework has to come from people who know the domain. You cannot generate it on demand.
But the result is fundamentally different. The human does the thinking. The AI tells them where it broke. That feedback loop can run in seconds rather than the weeks it takes in any institutional setting, and the human still has to do the cognitive work. They just find out faster when they get it wrong.
Beyond the classroom
Learning is the clearest proving ground for this model because the outcome is directly measurable. Can a medical trainee reason through an unfamiliar clinical scenario? Can a law graduate construct an argument from first principles, or only review one that AI drafted for them? The evidence on what compressed feedback does to these outcomes is strong and has been for decades.
But the principle extends wherever human judgment matters. An engineering firm that uses AI to develop its juniors' structural risk assessment is building something that outlasts any individual project. A policy unit that uses it to sharpen analysts' reasoning about second-order effects will quietly outperform one that automates the analysis away. These are not speculative claims. They follow directly from what we already know about how expertise develops.
The compound return
An AI that does the work produces one good output. An AI that develops the human's capacity to do the work produces someone who keeps producing good output long after the system is switched off. One of these compounds.
Every deployment of AI is, implicitly, a decision about whether to replace human capability or develop it. The commercial incentive favours replacement: it is faster, cheaper, easier to demo. Learning is where the cost of that choice is most visible, because the outcome is a person who can or cannot think. Bloom told us the answer in 1984. We have the tools now. The question is whether we use them to do the thinking or to sharpen it.