Who AI actually helps

A physicist used AI to extend the frontier of theoretical research. Experienced developers used the same generation of tools and got slower. AI does not replace expertise. It amplifies it. The danger is what happens when people skip the struggle that builds it in the first place.

Matthew Schwartz recently posted a paper on arXiv titled "Resummation of the C-Parameter Sudakov Shoulder Using Effective Field Theory". Dense quantum field theory work involving soft-collinear effective theory, Sudakov logarithms, and factorisation theorems. The final sentence of the abstract: "All calculations, numerical analysis, and manuscript preparation were performed by Claude, an AI assistant developed by Anthropic, working under physicist supervision."

Schwartz is not experimenting with chatbots. He has spent decades developing expertise in exactly the kind of precision calculations this paper tackles. The Aspen Center for Physics describes him as someone who "anticipates a future where machines and humans will work in harmony towards the next breakthroughs in our fundamental understanding of the universe." Apparently that future arrived faster than expected.

The amplifier

Consider a thought experiment. You and Lando Norris are each given an identical Formula 1 car at Monaco. Norris gets through the swimming pool chicane at over 100 mph, finding racing lines invisible to most drivers, extracting performance approaching the theoretical limits. You would probably crash at the first corner, assuming you could get the car moving at all.

The car does not make the driver. The car amplifies the driver.

This is the part of the AI productivity conversation that keeps getting glossed over. When Schwartz used Claude to produce a contribution to theoretical physics, he was not outsourcing his expertise. He was wielding it. Every prompt he wrote, every output he evaluated, every correction he made drew on decades of understanding how QFT actually works. He could spot when the AI went off the rails because he knew what "on the rails" looks like for Sudakov shoulder resummation.

The research

Most studies find that AI helps lower-skilled workers more than higher-skilled ones. A landmark Stanford/MIT study of customer support agents found that those using AI saw 14% productivity gains on average, but the biggest beneficiaries were novice and low-skilled workers, who improved by up to 35%. Productivity was essentially flat for the most skilled workers. Research from Harvard and BCG on consultants found similar patterns: AI compressed the performance gap between top and bottom performers. This has led to reasonable optimism that AI might be an equalising force.

But that framing misses something. Helping call centre agents resolve tickets faster is genuinely valuable. It is also a different phenomenon from what Schwartz demonstrated: using AI to extend the frontier of what experts can accomplish.

Consider the difference between two scenarios. In the first, AI helps a junior consultant produce work that looks like what a senior consultant would produce. The gap closes. In the second, AI helps that senior consultant produce work that previously would have required a team of five, or would not have been feasible at all. The ceiling rises.

The equalising narrative focuses on the first scenario. Schwartz's paper points toward the second.

The danger zone

If AI amplifies expertise, what happens to those who lack it?

An IE Business School analysis identified what they called the "danger zone": situations where novices use AI for non-codifiable, judgment-based tasks they cannot verify. The output looks sophisticated, sounds convincing, and may be complete nonsense. The professor describes encountering student submissions that "read like graduate-level work, but when I asked basic questions about methodology or reasoning, many students struggled to respond satisfactorily. They had produced sophisticated analysis without developing the thinking skills to support their own assignments".

This is the inverse of the Schwartz scenario. Instead of expertise being amplified, it is being bypassed. The student does not know what they do not know, and the AI's fluent prose convinces them they have understood something they have not.

A Cornell study of scientific publishing found exactly this pattern in academia. Researchers using AI tools published dramatically more papers, with productivity jumping 36 to 60 percent depending on the field. But papers that looked sophisticated according to AI-typical writing patterns were less likely to pass peer review than sophisticated human-written papers. The AI made the prose shinier without making the science better. Good writing was masking weak ideas.

The experts who got slower

A METR study took experienced open-source developers, people with an average of five years of experience on their repositories, and randomly assigned their tasks to either allow or disallow AI tools.

When developers used AI, they took 19% longer to complete their tasks.

Before starting, developers predicted AI would make them 24% faster. After completing the study, they still believed AI had helped them, estimating it had reduced their time by 20%. The subjective experience of AI assistance did not match the objective reality.

The researchers identified several contributing factors. The AI struggled with project-specific context that experienced developers already had. Time was lost evaluating and correcting AI suggestions. The overhead of AI interaction did not pay off for people who already knew what they were doing.

AI does not automatically amplify expertise. It amplifies expertise when the expert knows how to direct it toward problems AI can actually help with. Schwartz understood how to leverage AI for theoretical physics calculations. These developers, despite being experts in their codebases, may not have found the right integration pattern. Or the tools were simply not good enough for that specific use case. Expertise in a domain does not automatically translate to expertise in human-AI collaboration.

The jagged frontier

The Harvard/BCG research introduced a useful concept: the "jagged technological frontier." AI capabilities are not a clean line where everything on one side works and everything on the other does not. The boundary is irregular, with some seemingly difficult tasks inside AI's competence and some seemingly simple tasks outside it.

When consultants used AI for tasks inside this frontier, they completed 12% more tasks, 25% faster, with 40% higher quality. When they used AI for tasks outside the frontier, they were 19% less likely to get correct answers than those working without AI.

The frontier is invisible. It is not obvious which tasks fall where. And AI's confident, polished outputs do not signal when you have crossed the line into territory where it cannot actually help.

This is where expertise becomes crucial in a different way. The expert does not just know their domain. They develop an intuition for where AI can and cannot contribute. Schwartz probably did not hand Claude an arbitrary physics problem and trust the output. He likely understood which aspects were tractable for an AI assistant and which required his own judgment.

The novice has no such map. They wander across the frontier without knowing it, producing outputs that feel like progress but may be leading nowhere.

The foundation problem

Schwartz spent decades mastering quantum field theory through the slow, difficult process of actually learning it. He worked through calculations by hand, made mistakes, developed intuitions, and built the deep understanding that now allows him to effectively supervise an AI doing similar work.

If the next generation of physics students can get plausible-looking calculations from AI without building that foundation, will they develop the judgment needed to know when those calculations are wrong? Can you learn to supervise work you have never really learned to do yourself?

The OECD warns that "relying too heavily on generative AI can undermine critical thinking and hinder long-term skill retention, especially when learners accept information without questioning it". Studies show that people using AI "appear to perform better in their tasks, but they also show signs of reduced independent thinking".

This is not a new shape of concern. People worried that calculators would undermine mathematical understanding. They worried that GPS would destroy our sense of direction. Some of those worries proved overblown, others less so. But AI represents a qualitative shift in what can be outsourced. You cannot outsource the ability to recognise when you are lost.

What this means

None of this argues against AI assistance. The potential for experts to accomplish dramatically more is real. A physicist producing novel research with AI collaboration points toward a future where human expertise gets leveraged in powerful new ways.

But we should be precise about the situation. AI is not a universal productivity enhancer. It is closer to a variable-ratio amplifier: massive gains in some configurations, modest help in others, actual harm in certain situations.

The key variables seem to be: whether you have the expertise to evaluate AI outputs in your domain; whether you have developed an intuition for where the jagged frontier lies in your particular field; and whether you are building expertise that will remain relevant, or bypassing foundations you will eventually need.

Schwartz's paper is remarkable precisely because it required someone who had already spent decades becoming an expert. The AI did has not replaced that journey. However, the companies rolling out these tools are not waiting for the next generation to complete their own journeys. They are shipping now, optimising for adoption now, measuring "productivity gains" now.

The question is how we push a generation to still become experts when the tools make it so easy to skip the struggle. Because the struggle is not just inefficiency to be optimised away. It is where the required judgment comes from. Without people who have done that work, we end up with a world of fluent outputs and no one left who can tell if any of it is true.

← Back to Writing