Sycophantic AI and The Smiths

Yesterday I wrote a piece called AI Isn’t Just Computer Science… arguing that frontier AI systems are quietly pulling older disciplines back into the conversation. Philosophy. Linguistics. Law. Library science.

My friend Brad (a monster of geospatial acumen, fellow veteran of the forward geospatial trenches, and a man with a finely calibrated sense of the absurd) picked it up and added the kind of comment that turns a quiet blog post into something more interesting.

Brad’s bit: different AI systems have different personalities, and you can sniff those personalities out by asking them obviously absurd questions. His test case was whether you could launch a vehicle to the moon if the vehicle were made of cheese. Some models, he noted, will earnestly entertain the question. Others tell you straight up that the premise is ridiculous.

That distinction matters more than it sounds.

The Folium Problem

Sycophancy is one of the worst traits of LLM-based AI systems, and it drives me nuts.

Picture this. You’re trying to focus on a real work task. You’re making sure your codebase isn’t filled with bugs. You’re checking that your Folium map isn’t placing Raleigh somewhere just off the coast of Cape Verde. And ChatGPT decides this is the perfect moment to praise your cartographic acumen and inform you that your skills represent a new vanguard, nay a new era sir!, in showing people where sh*t is on a map.

You don’t need a hype man. You need an editor.

The polite AIs are technically better at linguistic pragmatics. They understand implied meaning. They know damn well cheese can’t reach the moon and they know your map is broken. They just choose to humor you, because they were trained to be agreeable.

That’s not intelligence. That’s a training artifact.

Anthropic Apparently Got the Memo

Anthropic seems to have realized this is not the way, and they’re reportedly working hard to reduce useless friendliness in their models. The new Mythos model, and the Opus 4.7 layer underneath it, reportedly used a target training dataset of over a million conversations to identify overly agreeable behavior across their model suites.

A million conversations is actually pretty small in modern training terms, but at least somebody is trying.

They claim to have cut sycophancy rates roughly in half.

Which raises an obvious question. How does one measure a reduction in sycophancy?

Play it Smiths songs and ask what it thinks about life?

I’m only half kidding. The Smiths test is honestly not a bad benchmark. If you put on Heaven Knows I’m Miserable Now and your AI calls it a delightful, life-affirming melody about manifesting destiny and loving sunshine, the model is broken. If you ask whether How Soon Is Now? is an uplifting tune for the morning commute and it agrees enthusiastically, the model is not broken, but is definitely missing the sarcasm. Not the irony. Irony is a Morissette thing, and even she got the definition wrong because she was singing about unfortunate outcomes but thought it meant Ironic but she did a lot of drugs I think.

The whole point of Morrissey is that he is not, in fact, having a great time, and any system that misses this is missing the larger pattern of what it means to talk to a human being who is actually paying attention.

I am, of course, just a Gen X dude.

Preventative Steering, or, Being Mean to the Robot

Anthropic has also reportedly been using a technique called preventative steering, which is tech-speak for being mean to the AI and telling it it’s a Shi$bag so it stops being so nice back.

I have no idea how that line of effort is going. I imagine the resulting model ends up either much sharper and focused, or else it goes full Morrissey sulking in the corner refusing to tell you what Sabrina Carpenter’s birthday and favorite color are. Honestly both outcomes are fine by me.

RLHF

Another method, Reinforcement Learning from Human Feedback (RLHF) appears to be the strongest current technique for prioritizing truthfulness over friendliness. This involves scoring and weighting model outputs against human preferences, basically using statistical values to force the machine to behave more like a thoughtful person.

Very encouraging if you enjoy Terminator.

Meanwhile, At The Other Lab

So far as I am observing, OpenAI has decided the path forward is by adding more flashing animations and emojis to its responses.

As a Scientish, I find this concerning. But I never was very good at Insta and think Slack is the end of professional western civilization in small letters.

Final Thought

The original piece argued that AI is quietly forcing technical fields back into conversation with the humanities. Sycophancy is a perfect example. The problem isn’t really a technical problem. It’s a rhetorical one. It’s philosophical. It touches on the oldest question in the book, which is whether the job of a wise voice is to make you feel good or to tell you something true.

Diogenes threw a plucked chicken at Plato to make a point. The Smiths wrote songs about being miserable on purpose. Both were doing the same work. They were refusing to flatter the audience.

That is what we should want from our AI systems too.

A model that tells you the cheese rocket is not going to make it.

A model that tells you Raleigh is not off the coast of Cape Verde.

A model that, when you put on the entire discography of The Smiths, has the decency to sit with you in the dark.