The Stanford 2026 AI in Education Review Is Clear: Pedagogical Design Beats General-Purpose AI. So Why Are Schools Buying the Wrong Tools?

Research Study

Eight hundred studies. Twenty with real causal evidence. What Stanford’s most rigorous review of AI in education actually reveals is not a technology problem; it is a structural market failure with consequences that will fall hardest on those who can least afford them.

The Verdict So Far, And Why It Should Alarm Us

In 2026, researchers at Stanford University’s AI Hub for Education published what is currently the most comprehensive causal review of AI in K-12 education. Led by Lily Fesler, JP Martinez Claeys, Chris Agnew, and Susanna Loeb, the Evidence Base on AI in K-12: A 2026 Review drew from a repository of over 800 academic papers. What it found should stop every school leader, EdTech founder, and education policymaker in their tracks.

Of those 800-plus papers, only 20 produced strong enough causal evidence to draw meaningful conclusions. And of those 20 studies, not a single one was conducted in a U.S. K-12 classroom with students. Zero. The evidence base that is supposedly guiding one of the most significant technological shifts in education history is, in the words of the researchers themselves, “still very limited.”

Yet procurement is accelerating. Policy is hardening. Vendors are scaling. And school systems around the world, many of them working with constrained budgets and under-trained staff, are being asked to make high-stakes adoption decisions based on evidence that does not yet exist at the scale required to justify them.

“We are not making evidence-based decisions. We are making market-driven ones and calling them reform.”

As Isabelle Hau, Executive Director of the Stanford Accelerator for Learning, has noted, only 11% of education decision-makers were looking at any type of evidence when making a purchasing decision on an EdTech tool, and only 7% of global EdTech tools carry any form of rigorous evidence. We are not making evidence-based decisions. We are making market-driven ones and calling them reform.

The Enjoyment Trap: When AI Removes the Struggle That Makes Learning Stick

Here is the most important finding in the Stanford review, and the one least likely to appear in a vendor pitch deck: AI tools consistently improve student performance while students have access to them. Remove that access, and the gains weaken, disappear, or in some cases reverse.

This distinction, between assisted performance and durable learning, is not a minor technical detail. It is the central question of education. Are students learning, or are they performing? Are they developing the capacity to think independently, or are they becoming fluent in the use of a tool that does the thinking for them?

The answer the research gives us should be deeply uncomfortable. In one Turkish study, students who used a general-purpose AI chatbot to prepare for an exam performed significantly worse on that exam than peers who had used a textbook. In a German study, students using AI chatbots to conduct research produced lower-quality reasoning and argumentation than those using a traditional search engine. In a study involving essay writing, 83% of students who wrote with AI assistance could not recall a single quote from their own essay afterward.

Any experienced educator will recognise what is happening here. This is the classroom equivalent of a teacher who never makes students practise. We have known for decades, through the foundational work of Robert Bjork and colleagues on what learning scientists call “desirable difficulties,” that the conditions that make learning feel hardest are precisely the conditions that make it stick.

Bjork’s research, available through UCLA’s Bjork Learning and Forgetting Lab, demonstrates that spacing, retrieval practice, interleaving, and effortful recall produce better long-term retention than conditions of ease — even when learners prefer the easier conditions and believe they are learning more from them. As Bjork himself framed it, performance and learning are different things, and they can move in opposite directions. AI, as currently deployed in most classrooms, is optimising for performance. It may be quietly eroding learning.

“AI, as currently deployed in most classrooms, is optimising for performance. It may be quietly eroding learning.”

A recent review published in PMC puts it plainly: a growing body of evidence suggests that AI may have a uniquely harmful effect on our cognition, outsourcing key mental functions, including problem-solving, to chatbots. When we remove the productive struggle from the learning process, we do not make learning more efficient. We make it shallower.

The Design Paradox: The Tools That Work Are Not the Tools Being Bought

The Stanford review does not leave us without hope. It identifies a clear pattern: AI tools that are pedagogically designed. That scaffold reasoning offers hints rather than answers, asks guiding questions, and releases cognitive responsibility to the learner gradually, performing meaningfully better than general-purpose AI tools.

The study from Turkey is instructive here. Students who used a tutoring-specific AI chatbot, one that gave hints without giving the answer directly, performed on par with peers using a traditional textbook. Students using a general-purpose AI chatbot performed significantly worse. The design of the tool was not a secondary variable. It was the decisive one.

This aligns precisely with what learning science has long told us about instruction. A teacher who simply provides answers is not teaching. A teacher who scaffolds the path to the answer, who makes the student do the cognitive work within a supported structure, is. The same logic applies to AI tools. General-purpose systems that produce complete answers on demand are not pedagogical tools. They are answer machines. And answer machines, however sophisticated, do not build learners.

The problem, and this is where the Stanford review points to something it does not fully name, is that the tools most schools are actually adopting are general-purpose tools. ChatGPT. Gemini. Copilot. Free, accessible, and institutionally attractive precisely because they require no specialised procurement or pedagogical infrastructure. The tools that work are the expensive ones. The tools being bought are the cheap ones.

As research on EdTech costs makes clear, institutional AI licensing ranges from $8 to $30 per teacher per month. At the district level with 500 teachers, a single tool costs up to $90,000 annually: before training, implementation, or support. Pedagogically designed AI systems, which require significantly more development investment to build correctly, sit at the higher end of that range or beyond it.

This is not a knowledge problem. School leaders are not buying the wrong tools because they do not know better. They are buying the wrong tools because the right tools are priced beyond what most education budgets can absorb. And that is a structural problem, not a capacity one.

“The tools that work are the expensive ones. The tools being bought are the cheap ones. This is not a knowledge problem. It is a structural one.”

This is not a knowledge problem. School leaders are not buying the wrong tools because they do not know better. They are buying the wrong tools because the right tools are priced beyond what most education budgets can absorb. And that is a structural problem, not a capacity one.

The Equity Inversion: Who Bears the Cost of Getting This Wrong

The Stanford Review contains a finding that sounds, on first reading, like good news: AI pedagogical support appears to be most beneficial for less experienced and lower-rated teachers. In a large-scale randomised controlled trial involving 900 tutors, students whose tutors used an AI support system called Tutor CoPilot were 4 percentage points more likely to master lesson topics. For students whose tutors were less experienced, that figure rose to 7 percentage points. For students of lower-rated tutors, it reached 9 percentage points.

The implication seems straightforward: AI can scale instructional quality, and it benefits most the educators who need the most support. This sounds like an equity win.

It is not. Or rather, it is a potential equity win that the market is currently structured to turn into an equity loss.

Under-resourced schools disproportionately employ novice and less experienced teachers. They are also the schools least able to afford education-specific, pedagogically designed AI tools. They end up by economic necessity, not by choice, with free general-purpose AI: the kind the research shows can actively harm learning outcomes. The schools with the greatest need for the right tool are the most likely to be using the wrong one. Research on AI and the digital divide confirms that AI’s contribution to education tends to marginalise students from under-resourced contexts, not through malice, but through the structural logic of market pricing.

A 2025 analysis by Innovative Human Capital makes the pattern explicit: students from higher socioeconomic backgrounds and learners in well-resourced institutions more effectively leverage AI tools to enhance learning rather than substitute for it, mirroring historical patterns where innovations intended to democratise learning often amplify existing inequalities.

“The schools with the greatest need for the right tool are the most likely to be using the wrong one.”

The World Economic Forum has documented how EdTech has repeatedly made this error: innovative learning technologies are piloted in affluent districts with the budget and infrastructure to support them, while the students who could benefit most remain out of reach. AI in education is following exactly the same trajectory, unless something structural changes.

The Recommendation: Pedagogically Designed AI Must Be Treated as Educational Infrastructure

This is where we move beyond the Stanford review, because the review, admirably rigorous in its analysis of existing evidence, stops short of naming the structural remedy. We will not.

The market cannot solve this problem on its own. The economic logic is clear: pedagogically designed AI tools cost significantly more to build than general-purpose ones. They require learning engineering expertise, iterative testing with real learners, pedagogical frameworks embedded in the product architecture, and ongoing refinement. That cost is real, and it is not absorbed by the market without subsidy or incentive.

The tools that learning science says we need are the tools that market economics makes hardest to build affordably and hardest to access equitably. We are not facing a technology gap. We are facing a policy gap.

Our recommendation is this: governments and international education bodies must begin treating pedagogically sound AI EdTech tools as educational infrastructure, not consumer software. The distinction matters enormously.

Education itself is classified as a public good. UNESCO’s framework positions knowledge and education as global common goods, with states holding primary responsibility to respect, protect, and fulfil the right to quality education. Governments routinely commit 15–20% of public expenditure to education. They regulate curricula, set teacher standards, and fund learning materials. When a tool becomes essential to the delivery of quality education, the argument for treating that tool as infrastructure, rather than as a commercial product, becomes compelling.

What would this look like in practice? It means governments removing tax and regulatory friction from the development of pedagogically designed AI tools, lowering the cost of production for builders. It means public funding frameworks that incentivise evidence-based tool development over feature-rich but pedagogically hollow alternatives. It means procurement policies that require demonstrable learning science grounding, not just marketing claims about personalisation and engagement. It means investment structures that treat EdTech R&D the same way governments treat pharmaceutical R&D, as something too important to leave entirely to market incentives.

The International Education Funders Group has called for exactly this kind of systemic shift: the future of EdTech must be shaped by evidence, guided by equity principles, and grounded in the realities of learners worldwide. For technology developers, this means engaging with students, teachers, governments, and communities as partners rather than customers. For funders and policymakers, it means demanding evidence before adoption and building frameworks that reward those who build correctly over those who build cheaply.

“We are not facing a technology gap. We are facing a policy gap. And policy gaps require policy solutions.”

When governments lower the cost of building the right kind of tool, they expand the pool of builders and investors who can do it sustainably. When the cost of production falls, so does the cost of access. And when access is no longer sorted by price point alone, the equity inversion we have described becomes structurally reversible.

A Final Word for the Global South: Where the Stakes Are Highest

The Stanford Review draws on studies conducted in Turkey, Germany, the United Kingdom, Brazil, South Korea, and the United States. There is not a single study in the evidence base from an African country. Not one.

This is not an oversight. It reflects a deeper reality: African education systems are being asked to make AI adoption decisions based on evidence that was not generated in their contexts, using tools not designed for their learners, at price points that exclude their schools, with infrastructure assumptions that do not reflect their realities.

A 2026 World Bank report from the AI for Education Summit in Nairobi noted a figure that should be read with alarm: only 0.2% of the data used to train AI models comes from Africa and South America. The tools shaping African classrooms were not built with African learners in mind. Their pedagogical assumptions, their language defaults, their content frameworks, all of it was designed elsewhere.

UNICEF’s Innocenti research office has warned directly that AI carries a “dual nature”, as both a productivity enabler and a cognitive risk. The latter, they note, is a risk Africa cannot afford. A continent grappling with learning poverty, teacher shortages, and infrastructure gaps cannot absorb the additional burden of AI tools that reduce reasoning quality, widen achievement gaps, and create tool dependency in place of durable skills.

Kenya’s draft National AI Strategy 2025–2030 signals genuine governmental ambition. Countries like Nigeria and South Africa are moving similarly. But ambition without the structural policy framework we have described — without treating pedagogically sound EdTech as infrastructure, without incentivising local builders, without regulating access to evidence-based tools, will produce the same market failure at continental scale.

The global argument for repositioning AI EdTech as educational infrastructure is compelling on its own merits. For Africa and the Global South, it is urgent.

The question before us

The Stanford Review is a gift, not because it confirms what we hoped, but because it clarifies what we must confront. The problem is not AI. The problem is that we are deploying the wrong kind of AI, at the wrong price point, with the wrong policy framework, in service of the wrong question.

The right question is not whether AI can improve education. The right question is: who is responsible for ensuring that the AI tools that actually improve education reach every classroom that needs them, not just the ones that can afford to pay for them?

That is a policy question. And it demands a policy answer.

References & Further Reading

Fesler, L., Martinez Claeys, J.P., Agnew, C., & Loeb, S. (2026). The Evidence Base on AI in K-12: A 2026 Review. AI Hub for Education, SCALE Initiative, Stanford University. https://scale.stanford.edu/research-in-action/understanding-evidence-base-ai-k12-education

Bjork, R.A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing. MIT Press. https://bjorklab.psych.ucla.edu/research/

Bastani, H., et al. (2025). Generative AI without guardrails can harm learning: Evidence from high school mathematics. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.2422633122

Stadler, M., Bannert, M., & Sailer, M. (2024). Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry. Computers in Human Behavior. https://doi.org/10.1016/j.chb.2024.108386

Wang, R.E., et al. (2025). Tutor CoPilot: A human-AI approach for scaling real-time expertise. arXiv. https://doi.org/10.48550/arXiv.2410.03017

UNESCO (2015). Rethinking Education: Towards a Global Common Good? https://unesdoc.unesco.org/ark:/48223/pf0000232555

World Bank (2026). The Future is Africa: Shaping AI-enabled EdTech for skilling the next generation. https://blogs.worldbank.org/en/education/the-future-is-africa–shaping-ai-enabled-edtech-for-skilling-the

UNICEF Innocenti (2025). How AI can transform Africa’s learning crisis into a development opportunity. https://www.unicef.org/innocenti/stories/how-ai-can-transform-africas-learning-crisis-development-opportunity

International Education Funders Group (2025). Equity, Evidence and Empowerment for EdTech. https://iefg.org/equity-evidence-and-empowerment-for-edtech/

Innovative Human Capital (2025). Beyond Learning Outcomes: The Hidden Costs of AI in Education. https://www.innovativehumancapital.com/article/beyond-learning-outcomes-the-hidden-costs-of-ai-in-education

Frontiers in Computer Science (2026). AI and the digital divide in education. https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2026.1759027/full

Faith Mundia is a Learning Engineer and Instructional Designer, Founder & CEO of FayEDU, and Editor of Grounding EdTech Magazine. She writes on the intersection of learning science, instructional design, and AI-powered education. With a particular focus on what it takes to build technology that actually develops learners, not just assists them.

📬 Want more insights like this?

Subscribe to Grounding EdTech and get weekly insights on AI, EdTech, and instructional design — plus free access to our Instructional Design for Educators course.

No spam. Unsubscribe anytime.

Leave a Reply

Your email address will not be published. Required fields are marked *