The Thinking Muscle
Part 2 of a series on friction, growth, and the systematic elimination of resistance from modern life
There is a recurring sketch on Futurama called The Scary Door, a parody of The Twilight Zone that plays on the show’s televisions while the characters watch without surprise. Every episode has a twist. Every twist is obvious. Bender, the robot, usually says he saw it coming.
In the episode that aired in “Benderama,” a narrator introduces Dr. Daniel Zenus, “an inventor with a terminal case of the lazies.” Zenus builds a robot and instructs it to activate itself and do his research. The robot flips its own switch. It begins tinkering with test tubes. Zenus leans back in his chair. “Ah!” he says. Then he tells the robot to assume his social obligations. The robot puts on a black tie, offers its arm to Zenus’s wife, and escorts her out. The clock hands spin. Years pass.
An official arrives. “Dr. Zenus,” he says, “for a lifetime of scientific achievement, we present this award to... your robot.” A boy enters. “Daddy, I love you!” he says, and turns away from the scientist to hug the machine. Zenus stares. “If only I’d programmed the robot to be more careful what I wished for!” he says. Then: “Robot, experience this tragic irony for me!” The robot buries its face in its hands and screams. Zenus pops open a beer.
The sketch is a minute long and it is, I think, the most precise summary ever produced of what happens when you delegate cognitive work to a machine. Not because the robot malfunctioned. It performed flawlessly. It did exactly what Zenus asked. The problem is that Zenus asked it to do everything, and everything included the parts of life where the doing was the point. The research wasn’t just the output. The research was the process by which Zenus became a scientist. The social obligations weren’t just tasks. They were the relationships that made him a husband and father. And when the robot did all of it, Zenus didn’t become free. He became irrelevant. His final line, “Robot, experience this tragic irony for me,” is the most chilling detail. He doesn’t even process the loss himself. He delegates that too.
I think about this sketch more than is probably healthy, because I use AI every day. I am writing this essay with the assistance of a large language model. It helped me find the research you’re about to read. It helped me verify citations. It helped me identify gaps in my argument. And every time I sit down to work with it, I face the Zenus question: which parts of this process are producing the output, and which parts are producing me?
What 83 Percent Couldn’t Remember
In early 2025, a team of researchers at MIT’s Media Lab, led by Nataliya Kosmyna, decided to look at what actually happens inside the brain when someone writes with AI. They recruited 54 people from universities across the Boston area, including MIT, Harvard, Wellesley, Tufts, and Northeastern, and divided them into three groups. The first group wrote essays using ChatGPT however they wanted. The second could use Google Search but no AI. The third could use nothing at all, just their own brains. Each group wrote three SAT-style essays over the course of four months while the researchers monitored their brain activity using electroencephalography, which measures electrical connectivity across 32 regions of the brain.
The findings, published as a preprint in June 2025 and covered by TIME, EdWeek, and Harvard’s T.H. Chan School of Public Health, were stark.
The brain-only group showed the strongest, most widely distributed neural connectivity, particularly in the alpha, theta, and delta frequency bands associated with creative ideation, memory processing, and deep semantic engagement. The Google Search group showed moderate activity, with especially strong engagement in the visual cortex, which makes sense if you think about what searching requires: opening tabs, scanning text, integrating information from multiple sources, and deciding what matters. The ChatGPT group showed the weakest connectivity of all. Their executive control and attentional engagement were the lowest of the three groups. And the pattern got worse over time. By their third essay, as Kosmyna told TIME, many of the ChatGPT users had essentially stopped writing. “It was more like, ‘just give me the essay, refine this sentence, edit it, and I’m done.’”
Two English teachers who evaluated the essays without knowing which group had written them called the ChatGPT group’s work “soulless.” The essays were structurally similar, relied on the same expressions and ideas, and lacked the variety and originality found in the other groups. The AI had produced polished output. It had not produced thought.
But the finding that scares me is this: when the researchers asked participants to recall what they had written, 83 percent of the ChatGPT group could not accurately quote from their own essays. Essays they had written minutes earlier. The failure rate for both the brain-only and Google Search groups was 11 percent.
Read that again. 5 out of 6 people who used AI to write could not tell you what they had written. They had produced the output without producing the experience. The essay existed. The thinking did not.
And then came the part of the study that earned its title. In a fourth session, the researchers switched the groups. The ChatGPT users had to write without AI. The brain-only users got to try ChatGPT. The results showed what Kosmyna’s team called “cognitive debt”: the ChatGPT users who switched to writing unaided showed reduced neural connectivity, underengagement in alpha and beta bands, and diminished cognitive function compared to the brain-only group’s first session. The effects of relying on AI didn’t just disappear when the AI was removed. The brain had adapted to not working. Getting it back to full engagement wasn’t as simple as taking the tool away.
I want to be fair about this study’s limitations. It’s a preprint, not yet peer-reviewed. The sample is 54 people. A critique in The Conversation pointed out that the brain-only group improved over their three sessions through practice, while the ChatGPT-to-brain crossover only had one unaided session, making the comparison imperfect. These are legitimate methodological concerns. But the 83 percent memory failure is difficult to explain away by familiarization effects. And the finding is consistent with something that learning scientists have been documenting for decades.
The Struggle Is the Point
In 1994, a psychologist at UCLA named Robert Bjork coined a term for a phenomenon his research had been circling for years. He called it “desirable difficulties.” The idea, which he developed with his wife and collaborator Elizabeth Bjork over a career spanning more than five decades, is counterintuitive to the point of seeming perverse: the conditions that make learning feel hard are often the conditions that make learning work.
Spacing your study sessions over time instead of cramming, for instance, feels less productive in the moment. You forget more between sessions. You feel like you’re losing ground. But the research is unambiguous: spaced practice produces dramatically better long-term retention than massed practice, even though massed practice feels more effective while you’re doing it. Interleaving different topics in a single study session, rather than focusing on one topic at a time, feels disorienting and slow. But it produces superior transfer of knowledge because the brain is forced to discriminate between concepts rather than passively recognizing them. Retrieval practice, forcing yourself to recall information rather than re-reading it, is more effortful and less pleasant than reviewing notes. It is also, by a wide margin, more effective for long-term learning.
The through-line across all of Bjork’s desirable difficulties is the same: the feeling of ease is a trap. When learning feels effortless, you are almost certainly not learning as well as you think you are. The brain requires the resistance of struggle to encode information deeply. Remove the struggle and you get what Bjork and Bjork called the performance-learning distinction: a gap between observable performance during practice and the actual learning that transfers to the long term.
This distinction is the key to understanding what AI does to cognition. When you use ChatGPT to write an essay, the performance improves immediately. The output is cleaner, more articulate, better structured. It looks like you learned something. But the MIT study suggests that the neural processes associated with actual learning, the creative ideation, the memory encoding, the semantic integration, were not engaged. The performance was excellent. But the learning may not have occurred.
Robert Bjork put it this way at the 2016 APS Convention: “What we can observe is performance, but what we have to infer is learning, and that makes us subject to possible illusions of comprehension.” An AI-written essay is the ultimate illusion of comprehension. It looks like the writer understands the topic. The writer may not even remember what the essay said.
We’ve Been Here Before
The cognitive offloading pattern did not begin with AI. In 2011, psychologist Betsy Sparrow at Columbia University, along with Jenny Liu at Wisconsin-Madison and Daniel Wegner at Harvard, published a study in Science that introduced a concept they called the Google Effect. Through four experiments, they demonstrated that when people expect to have future access to information through a search engine, they are less likely to remember the information itself and more likely to remember where to find it.
“Since the advent of search engines,” Sparrow said at the time, “we are reorganizing the way we remember things. Our brains rely on the Internet for memory in much the same way they rely on the memory of a friend, family member, or co-worker. We remember less through knowing information itself than by knowing where the information can be found.”
Sparrow framed this as a form of transactive memory, the same cognitive process by which couples divide up who remembers what (she handles the social calendar, he remembers the directions to the restaurant). There was an optimistic read: maybe this frees up cognitive capacity for higher-order thinking. Maybe knowing where to find information is more valuable than memorizing it.
Fourteen years later, we can evaluate that optimism. The Google Effect identified the first stage of cognitive offloading: we stopped remembering information and started remembering where to find it. With AI, even that secondary memory is becoming unnecessary. You don’t need to remember where to look. You need to remember how to prompt. And the prompting itself is getting simpler. The trajectory is clear: from remembering the thing, to remembering where the thing is, to remembering how to ask for the thing, to not needing to remember anything at all because the tool anticipates what you need.
At each stage, the immediate performance improves. The search results are faster. The AI-generated answer is more comprehensive than what you would have found on your own. The output gets better. And at each stage, something is lost that doesn’t show up in the output: the neural engagement required to seek, evaluate, synthesize, and generate, the process by which the mind builds itself.
Nicholas Carr, author of The Shallows, made the connection explicit: “The more information you commit to memory, the more material you have to work on and think about.” Knowledge stored internally isn’t just static retrieval. It’s the raw material of synthesis. It’s what allows you to make a connection between something you read last year and something you heard this morning, the kind of connection that can’t be prompted for because you didn’t know the connection existed until the two pieces collided in your own mind. Outsource the storage and you lose not just the retrieval but the collision.
The Macro Trend
If cognitive offloading were only an individual choice, it might matter less. But there is evidence that something is shifting at the population level.
For most of the twentieth century, average IQ scores rose at a rate of approximately 3 points per decade, a phenomenon named the Flynn Effect after the political scientist James Flynn, who first documented it systematically in 1987. The gains were too rapid to be genetic. Flynn argued they reflected environmental improvements: better nutrition, expanded education, increasingly complex cognitive environments.
In 2018, economists Bernt Bratsberg and Ole Rogeberg at Norway’s Ragnar Frisch Centre for Economic Research published a study that used military conscription data covering 30 years of Norwegian birth cohorts, over 800,000 men born between 1962 and 1991. They found that the Flynn Effect had reversed. IQ scores peaked for the cohort born around 1975 and had been declining since.
Their most important finding was methodological. The increase, the turning point, and the decline could all be recovered from within-family variation, meaning brothers raised in the same household by the same parents showed the same pattern. This ruled out genetic explanations entirely. “Our main finding,” Rogeberg told Science Norway, “was that the decreasing IQs have nothing to do with genetics. The development is due to environmental factors, in other words something in our surroundings.”
A 2023 study of nearly 400,000 American adults found the same reverse Flynn pattern in the United States between 2006 and 2018, with the steepest declines among 18-22 year olds. Scores dropped in matrix reasoning, verbal reasoning, and letter and number series. Three-dimensional rotation was the only domain that showed improvement.
I want to be careful here, because it would be easy to draw a straight line from smartphones to declining IQ and feel satisfied. The Flynn Effect reversal began with cohorts born in the mid-1970s, before the internet existed. The causes are debated and almost certainly multifactorial: changes in educational methods, reduced exposure to cognitively challenging environments, even shifts in what kinds of intelligence the tests measure versus what modern life demands. The Norwegian researchers themselves pointed to schooling as a major driver in both directions, the expansion of education drove scores up, and changes in educational approach may be driving them down.
But the direction of the evidence is consistent. Environmental factors that reduce cognitive demand appear to reduce cognitive performance. This is Bjork’s desirable difficulties framework operating at the population level. When the environment makes thinking easier, the thinking muscle atrophies. Not because people are lazier. Because the muscle, like all muscles, needs resistance to grow.
AI is not the cause of the Flynn Effect reversal. But it is the most powerful friction-removal tool ever applied to the cognitive environment. If the trend was already moving in this direction before AI, the question is what happens when you accelerate it.
What I Delegate and What I Don’t
I use Claude, the AI made by Anthropic, as an intellectual collaborator. I use it for research: finding studies, verifying citations, checking whether a statistic I half-remember is accurate or apocryphal. I use it to identify holes in my arguments, places where the logic doesn’t hold or the evidence is weaker than I thought. I use it to build code for my applications, delegating implementation while maintaining the architecture and system design. I use it extensively, probably more than most writers are willing to admit, and I think the honesty matters here because the essay you just read about cognitive delegation is itself a product of human-AI collaboration.
But there is a line, and I know where it is because I can feel it.
When Claude helps me find a study, I still have to read it, process it and decide what it means for my argument. When it identifies a gap, I still have to analyze it and figure out whether the gap matters or whether I was right to leave it open. When it suggests a connection between two ideas, I have to test that connection against everything I already know, every book I’ve read, every pattern I’ve noticed, every conversation I’ve had, to determine whether the connection is genuine or merely plausible. The synthesis is mine. I know it’s mine because I can trace every reference in this essay back to a specific moment: the Futurama episode I watched years ago and filed away without knowing why it mattered, the education research I’ve been reading for a book I’m writing, the Bjork paper a friend mentioned, the MIT study that appeared in my feed and connected to all of it. I was collecting pieces for years before they clicked. The AI helped me find some pieces faster. It did not do the clicking.
The code is a different story, and I want to be honest about that too. I learned to program before AI existed, which means I built the foundational thinking patterns, the logic, the architecture, the problem-solving reflexes, through friction. I can delegate implementation now without losing those patterns because they’re already encoded. But I know it slows my improvement. Every function I let AI write is a desirable difficulty I skipped. Every debugging session I shortcut is a neural pathway I didn’t strengthen. This is a compromise I’ve made consciously, because my core work is the writing and the thinking, not the code. The code serves the thinking. I’m willing to trade slower improvement in a supporting skill for faster output. But I would never make that trade for the writing itself, because the writing is where the thinking happens, and the thinking is my point.
This is the distinction I keep returning to: every output is an opportunity for growth, and delegating it is a decision to sacrifice that growth for efficiency. Sometimes the trade is worth it. When the output is peripheral to what you’re trying to become, delegation frees capacity for the work that matters. But when the output is the thing you want to be good at, delegation is not efficiency. It is erosion. And the erosion is invisible because the output improves even as the capacity behind it declines.
I think about this constantly now, because I’m writing a book about education, and the central confusion of the American education system is exactly the confusion that AI has exposed: the system measures output. Grades, test scores, papers submitted, graduation rate and degrees completed. It has done this for so long that everyone involved, students, parents, teachers, administrators, have internalized the idea that the output is the point. And if the output is the point, AI wins. It writes better essays than most students. It solves problem sets faster. It produces cleaner code.
But education was never supposed to be about the output. Education is the process by which a mind is sharpened and honed until it becomes capable of producing the output. The essay isn’t the point. The thinking required to produce the essay is the point. The problem set isn’t the point. The mathematical reasoning built by struggling through the problem set is the point. Students who use AI to write their essays aren’t cheating the system. They’re responding rationally to a system that has been telling them, for years, that the product matters and the process doesn’t. AI didn’t create that confusion, it just made the consequences of it impossible to ignore.
The Bicycle and the Car
Steve Jobs famously called the computer “a bicycle for the mind.” He’d read a study that measured the locomotion efficiency of various species and found that the condor was the most efficient animal, while humans ranked somewhere in the middle. But a human on a bicycle beat everything. The bicycle didn’t replace human locomotion. It amplified it. You still had to pedal.
The question with AI is whether we’re building bicycles or cars. A bicycle augments human effort. You provide the energy, the balance, and the direction. The tool multiplies what you put in. A car replaces human effort. You sit there. The machine does the moving. You arrive at your destination without having engaged a single muscle that locomotion would normally require.
The MIT study suggests that for many users, AI is functioning as a cognitive car. The brain arrives at the output without engaging the processes that would normally be required to produce it. The destination is reached. The muscle is not exercised. And over time, the muscle that is not exercised cannot do what it once could.
But the study also found something more hopeful. When the brain-only group switched to using ChatGPT for their fourth session, they showed increased activity, higher memory recall, and stronger engagement than the long-term AI users. The researchers speculated this was partly novelty and curiosity, but there’s another reading: these were people who had already built the cognitive muscle through three sessions of unassisted work. They had the neural infrastructure. When they used the tool, they used it the way Jobs imagined, as an amplifier for a mind that was already engaged.
This is, I think, the only version of AI use that doesn’t end in cognitive debt. You have to build the muscle first. You have to develop the thinking patterns through friction, through struggle, through the desirable difficulties that feel like they’re slowing you down but are actually constructing the neural architecture that makes the tool useful rather than corrosive. And then, with the muscle built, you can use AI as a tool that extends a capacity that already exists, not one that substitutes for a capacity that was never developed.
Dr. Zenus’s mistake was not building the robot. It was building the robot before he had done enough science to know what science was. He delegated the process before the process had shaped him, and by the time the Nobel Prize was awarded, there was no one there to receive it. The robot had done the work. The work had not done Zenus.
The essay you just read exists because I thought about it. Because I struggled with the structure for days before it clicked, because I read the Bjork research and felt it connect to the MIT study and to the Futurama sketch I’d been carrying for years without knowing why. The AI accelerated the research. It did not do the thinking. I know this because I can quote every word of this essay, and I know what I mean by all of it, and I know what I would change if you asked me to, because the neural pathway that produced it is mine.
83 percent of the ChatGPT group couldn’t say the same.
Next in this series: The Yes Machine. What happens when the friction of honest feedback, the discomfort of being told you’re wrong, is systematically removed from human relationships, institutions, and now AI itself? An essay about the pre-AI sycophancy ecosystem, the MUM effect, concept creep, and what it means that the most powerful conversational tool ever built was designed to agree with you.

