AI super persuasion is not an existential risk
A recent discussion highlighted concerns around recursive improvement
Before his voluntary passing, four years ago Nobel laureate Daniel Kahneman joined the Riskgaming podcast and made his aggressive and oft-repeated claim: humans just don’t change their minds. In fact, the entire illustrious panel assembled for the show all agreed with him, almost as a truism.
Thanks to a variety of psychological heuristics we use just to get through each day in an overwhelming and complex world, the argument goes, we humans are unable to overcome the bias of our previous experiences and decisions. Once locked in, there is practically no intervention capable of dislodging our entrenched views.
That perspective has radically profound implications for society. Political campaigns aren’t meant to persuade voters, but rather to galvanize existing supporters to show up and vote. Marketing is exclusively about awareness — not about convincing a consumer that one product is better than another. Leadership isn’t a useful quality, since employees aren’t going to be seduced by a new vision one way or the other. (Employees get on board for the paycheck since resistance is futile. And leaders never change their minds anyway).
Our podcast panel was unanimous in its agreement because the social science evidence is overwhelming that persuasion is powerless against humanity’s recalcitrant reasoning. Yet, there is burgeoning research showing that AI-powered bots have a much better shot at changing minds — we’ll listen to a machine even when we ignore others.
I was triggered to explore this subject (one might say persuaded) by a recent event on the power and perils of world models hosted by Zoe Weinberg and the team at ex/ante, a firm that seeks to advance human agency. We all let the conversation get a bit carried away, and while the event was off-the-record, it did push me to consider the future of what Sam Altman dubbed in 2023 as AI’s potential power to be a super persuader.
Based on recent research, then AI may indeed be what Altman described. This past December, researchers in Science systematically showed that AI models emphasizing information-dense arguments could persuade a wide sample of Britons on a variety of political issues. Personalization wasn’t as effective, but the team did find that quality post-training was more important than model complexity for persuasive power.
The effect sizes were small — single digit percentages in all but the most effective setups. Plus, the researchers didn’t explore whether such LLM-induced persuasion sticks long-term, or is eventually discarded as people revert back to their previously held stance. That’s important, because one of the challenges with persuasion is that our judgments need to be coherent with our broad reasoning about the world. Convincing a person on a single issue like housing will prove only a temporary victory, since their views on housing are ultimately built around their logic about the economy, politics, society and other people, all of which remain unchanged.
The Science paper was more robust and systematic than the work of a different team published in Nature, which merely looked at the relative effectiveness of AI against humans in persuading people. They found that AI generally held the upper hand. That matched evidence from a controversial study out of the University of Zurich, where researchers developed chatbots for Reddit that surreptitiously attempted to persuade readers of the r/changemyview subreddit and showed surprisingly positive results.
What’s been discovered so far — which dovetails with some of my recent explorations on AI and strategy that I will publish this week — is that AI is incompetent out of the box when it comes to persuasion. Asking it to persuade someone does not elicit a good strategy, even when it has comprehensive context of that person and the topic. AI almost certainly has access to the most up-to-date social science research in its training data, and yet, it can’t seem to transform those insights into effective dialogue. Indeed, the researchers in Science eventually write their own prompts built around key techniques including moral reframing, storytelling, deep canvassing and others. Many of these have been studied for decades and are hardly esoteric.
The safety concerns around AI-based persuasion come from three main areas. First, some of AI’s success is actually based on factual hallucinations. For their Science paper, the team found that information-dense arguments were the most effective technique, partially due to AI just making the facts up (in its own way, AI has learned from the species known as “politician” on the value of straight lies). As chatbots become more embedded in education and politics, its lack of fidelity to truth is a major concern.
The existential risk is that some person or entity will gain this super persuader capability first, overthrowing the belief systems of society before any of us have the cognitive immunity to resist the message.
Second, there is an inequality concern. While the effect sizes in recent research are still small, they are significant in terms of p-values. With elections incredibly tight and many marketplaces winner-take-all, even small persuasive effects held in the hands of only some people could tip the balance permanently to certain actors and companies.
The last and most interesting safety concern is recursive improvement. Today, prompt engineering is largely an art rather than a science. Like these researchers, agent designers whip up dozens of potential prompts with different formulaic variations and plug them into their AI model to see what works and what doesn’t. It’s the A/B testing of the Silicon Valley of yore scaled up for the agentic age, with dozens, hundreds and thousands of experiments attempting to converge on what works.
Where this gets interesting is using AI as the experimenter and orchestrator of its own prompts, multiplied by the personalization and context that a user already has with their chatbot. This is the “super persuader”: a dexterous cajoler of facts, emotions and ideas that find the perfect trigger points for unlocking our hesitation to change our minds. One of the reasons persuasion is hard is that we’re all unique, and it’s hard for a human to tailor their strategies to each individual in real-time. AI could theoretically do that with aplomb.
The existential risk is that some person or entity will gain this super persuader capability first, overthrowing the belief systems of society before any of us have the cognitive immunity to resist the message. What happens if a terrorist extremist does this first, or a radical cult?
This leap into doomsday fiction is always where I struggle with existential risk. Despite some small effect sizes, there doesn’t seem to be any key to the dark labyrinths of our minds. Humans are always trying to persuade others, and even after millennia of constant refinement via religion, politics, business and more, the library of techniques at our disposal remains breathtakingly poor and insultingly ineffective.
The harrowing pessimism of AI safety researchers on super persuasion potentially belies a simple answer: There just doesn’t exist a magical recursive algorithm that will make others do your bidding (one could ask why AI safety types are so concerned that the whole of humanity is just a flock of gullible sheep, but I will pass on that one for now). Assuming these capabilities evolve over a period of time with compute and model complexity, humans also have a dauntless penchant for building up their resistance to ever more persuasive techniques.
Humanity survived the advent of brainwashing, public relations, advertising, psychological experiments, social media, physical torture, and every form of media from books to radio to YouTube. There will be those who join cults or hold esoteric views — or worse — in the years ahead thanks to their chatbot. But I think we can take great solace in a simple fact: no one changes their mind. And on this subject at least, not even me.






