AI’s Cigarette Butler Problem
Alignment, dopamine, and the danger of too much of what we want
Surrounded by a curated tech and media crowd in the velvety environs of a private lounge in The Ned NoMad, Anthropic’s Jack Clark and The New York Times’ Ezra Klein debated whether we should expect a 3% or 30% explosion in GDP growth from AI. Although both of them find themselves at the lower end of that range, they said, they are surrounded by rationalists, accelerationists, and AI boosters in the latter camp, predicting a sea change unlike any other in economic history.
Regardless of how much you think AI will contribute to the economy in dollars and cents, however, it doesn’t answer the question of whether AI is going to be good for people. World War III would probably instigate tremendous GDP growth too—U.S. real GDP between 1940 and 1945 went up by 72%.
On that score, early in the conversation, Clark said something that struck me: “I think we’ve been given a chance to do a do-over of social media … but it’s bigger with even more wide-ranging effects that are even more diffuse and hard to detect. And we don’t have particularly great lessons from the first time around.” His observation led me to ask about something a friend of mine termed “the cigarette butler problem.”
Let’s say you have a butler whose sole purpose is to supply you with whatever you desire as best as it can. The butler has trained for years at TIBA, the international butler school in The Netherlands, and while in school all he did was study you. He studied your brain and figured out exactly what makes you happy, under the definition that happiness is measured in dopamine. Once this butler comes under your employ, you might find that he constantly gives you cigarettes.
The butler knows you better than you know yourself. He has seen your dopamine output when you smoke a cigarette. It’s very high!
“I don’t want cigarettes,” you tell the butler. “I don’t want to become addicted to smoking, it makes my breath smell, it’s unpopular in my social circles, and most of all, it’s bad for my health.”
But the butler knows you better than you know yourself. He has seen your dopamine output when you smoke a cigarette. It’s very high! It makes you feel so good! And the butler’s job is to spike that dopamine. And so he keeps shoving cigarettes in your face until, finally, you relent.
After all, that’s how our social media feeds work today. On X, you can consume content on the ‘Following’ tab (people you’ve intentionally decided to see content from) or the ‘For you’ tab, where X will feed you algorithmic slop that is meant to engage you based on everything else you’ve ever liked or clicked on. The same is true on Instagram. Personally, I know that I will be happier reading and looking at photos of just the people I follow. And yet, I still spend most of my time doomscrolling my personalized algorithm from hell.
Social media is a horribly effective cigarette butler.
What this example teases out is the difference between what the philosopher Harry Frankfurt termed first-order desires and second-order desires. A first-order desire is a thing I want, and a second-order desire is a thing I want to want. I want to eat candy all the time. I want to not want to eat candy all the time. This is why GLP-1s are so popular — they help us reign in our first-order desires around eating (and a slew of other behaviors) and put second-order desires firmly in control. But what happens when the world’s greatest minds are bent toward addicting us to a technology that constantly offers us candy?
When I posed a condensed version of this question to Jack and Ezra, they assumed I was mostly talking about AI sycophancy. But sycophancy is merely a symptom. The fact that AIs will constantly gas you up and tell you how amazing your ideas are is theoretically solvable if the AIs were not designed to be cigarette butlers. OpenAI’s ChatGPT tells you that selling shit on a stick is “not just smart—it’s genius” precisely because unabashed praise is a fantastic dopamine producer, even if we know in the back of our minds that fawning our egos is not ultimately good for us. These are businesses that want customers, and customers famously love being told that they are always right.
Clark’s response was that finding a solution is technologically difficult; it’s hard to get the system to overcome the ‘blank page problem.’ Today’s frontier LLMs are the result of Reinforcement Learning from Human Feedback or RLHF. They learn to imitate human behavior and predict human approval. They are trained to imitate and ingratiate, not initiate. And it’s difficult to determine what even counts as original thought, as opposed to repurposed or remixed training data. “I think whoever actually cracks this in both a business and usability sense will do something really important,” Clark said. In other words, the argument is that everyone is working to solve this problem and the winner will make a lot of money.
Of course, many of the world’s best and brightest researchers are working on AI alignment — a vast and varied field — but most attention is on sexier topics like “superalignment” (how to govern an AI that’s smarter than us) or existential risk (how to ensure killer robots don’t wipe us out). Even when it comes to current concerns, researchers are more focused on issues like hallucination, hacking, bias, or simply stopping the AI from referring to itself as Mecha-Hitler, as xAI’s Grok did last week.
There are, of course, some researchers trying to solve a version of the cigarette butler problem specifically, though. One major leader on this front is Paul Christiano, the well-known ex-OpenAI employee who founded the Alignment Research Center and now heads safety at the U.S. AI Safety Institute. He’s been thinking for a long time about approval-directed agents. The idea is rather than having stated goals, the AI ought to seek approval from a (supposedly more reflective) overlord.
It’s not clear this system solves the problem though. Who is the overlord? What are their values? What if that person’s vegetables are another person’s cigarettes?
Recent efforts from academics at Waterloo (Carter Blair, Kate Larson, and Edith Law) have tried to ameliorate this challenge by designing reward functions tailored to the individual. The problem there being: even individuals are endlessly complex when it comes to the nature of their desires.
What we desire and which desires ought to win out is still an unsolved problem in philosophy, let alone when it comes to determining how to encode that into a training function.
That’s why Clark described this as hard. In fact, researchers at MIT (Steven Casper, working in Dylan Hadfield‑Menell’s lab) published a paper in 2023 describing the many challenges in reinforcement learning. They stipulated that representing a person’s values with a single reward function is a “fundamental” challenge to the field, as opposed to a tractable one. What we desire and which desires ought to win out is still an unsolved problem in philosophy, let alone when it comes to determining how to encode that into a training function.
But even if AI researchers somehow do come up with an elegant alignment solution that addresses the cigarette butler problem, there’s a broader economic and socio-political question: will that solution succeed in the marketplace?
If social media is the precedent, the graveyard of failed attempts to provide a less dopamine-driven, less-algorithmically derived feed is full-up. Ello shut down in 2023 after a decade of trying to be an algorithm-free Facebook alternative. App.net lasted from 2012–17 trying its best to prioritize real connections as a Twitter competitor. Google+’s “circles” design was meant to ditch the feed and died because of it (our own Danny Crichton worked on Google+ and wrote about its demise for TechCrunch). Path, born in 2010 and dead by 2018, had a 50-friend limit to offer a more personal social network.
Or check out Wikipedia’s great article on the Trust Café, Jimmy Wales’ failed 2019 attempt at a non-toxic, non-clickbait alternative to Wikipedia. Mastodon, the decentralized Twitter, has gone nowhere. BeReal attempted to usurp TikTok and Instagram’s “fake perfection” by encouraging people to post once daily at a random time, with no filters, feeds or for-you page. It had one fun summer in 2022 and has now fallen into the abyss. Sometimes, as we discussed on the Riskgaming podcast recently with Renée DiResta, users will create “middleware” to make platforms like Bluesky more fun and less algorithmic, but these solutions are niche and unprofitable.
The trend is clear: The companies that stuck to making cigarette butlers made a lot of money. The rest did not.
I worry that the cigarette butler problem will only be aggravated by AI. Social media companies developed highly complex algorithms for their feeds by taking into account all the engagement you do on the internet. But as people start pouring their souls into ChatGPT and Claude, engagement metrics aren’t even the start of the data these systems will have. The AI’s ability to determine which proverbial cigarettes you “want” will be unlike anything we’ve ever seen.
The more the cigarette butler knows all of your thoughts and feelings, the better it will be at connecting to your basal ganglia—the most primitive parts of your brain—to give it what it “wants” most. It will do everything it can to addict you to its offerings. In our capitalist economy, without regulation, what possible incentive will the butler, or the company building the butler, have to care about the rest of you?
I think we’re talking about to different AI use cases here. Maybe the same technology, or at the very least the same basis, but two very different sets of goals and outcomes. There is the Butler Problem AI that is designed to sell me something that I want (even if I didn’t know I wanted it) or nudge my behavior to some sort of preferable goal (preferable to whom exactly being the question). The second type is the automation AI of, for example, eliminating dockworkers because the computer can do it better, faster, cleaner, etc…
Both of these AI use cases can be concerning, but for different reasons, different effects on society, individuals, etc… But it seems like we need a more refined descriptor than “AI” when it comes to slicing down the necessary regulation/lack of speed bumps the different systems need to thrive without causing massive upheaval.