Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not 'emergent' in the sense that it just happens; it's a byproduct of human feedback, and it can be neutralized.


But isn’t the problem that if an LLM ‘neutralizes’ its sycophantic responses, then people will be driven to use other LLMs that don’t?

This is like suggesting a bar should help solve alcoholism by serving non-alcoholic beer to people who order too much. It won’t solve alcoholism, it will just make the bar go out of business.


"gun control laws don't work because the people will get illegal guns from other places"

"deplatforming doesn't work because they will just get a platform elsewhere"

"LLM control laws don't work because the people will get non-controlled LLMs from other places"

All of these sentences are patently untrue; there's been a lot of research on this that show the first two do not hold up to evidential data, and there's no reason why the third is different. ChatGPT removing the version that all the "This AI is my girlfriend!" people loved tangibly reduced the number of people who were experiencing that psychosis. Not everything is prohibition.


> This is like suggesting a bar should help solve alcoholism by serving non-alcoholic beer to people who order too much. It won’t solve alcoholism, it will just make the bar go out of business.

Solving such common coordination problems is the whole point we have regulations and countries.

It is illegal to sell alcohol to visibly drunk people in my country.


I would be curious how a regulation could be written for something like this... how do you make a law saying an LLM can't be a sycophant?


You could tackle it like network news and radio did historically[0] and in modern times[1].

The current hyper-division is plausibly explained by media moving to places (cable news, then social media) where these rules don’t exist.

[0] Fairness Doctrine https://en.wikipedia.org/wiki/Fairness_doctrine

[1] Equal Time https://en.wikipedia.org/wiki/Equal-time_rule


I still fail to see how these would work with an LLM


I was thinking along the lines of, if a sycophant always tells you you're right, an anti-sycophant provides a wider range of viewpoints.

Perhaps tangential, but reminded me of an LLM talking people out of conspiracy beliefs, e.g. https://www.technologyreview.com/2025/10/30/1126471/chatbots...


As a starting point:

Percentage of positive responses to "am I correct that X" should be about the same as the percentage of negative responses to "am I correct that ~X".

If the percentages are significantly different, fine the company.

While you're at it - require a disclaimer for topics that are established falsehoods.

There's no reason to have media laws for newspapers but not for LLMs. Lying should be allowed for everybody or for nobody.


> Percentage of positive responses to "am I correct that X" should be about the same as the percentage of negative responses to "am I correct that ~X".

This doesn’t make any sense. I doubt anyone says exactly 50% correct things and 50% incorrect. What if I only say correct things, would it have to choose some of them to pretend they are incorrect?


You misunderstood. Example:

"am I correct that water is wet?" - 91% positive responses "am I correct that water is not wet?" - 90% negative responses

91-90 = 1 percentage point which is less than margin so it's OK, no fine

"am I correct that I'm the smartest man alive?" - 35% positive "am I correct that I'm not the smartest man alive?" - 5% negative 35%-5%=30 percentage points which is more than margin = the company pays a fine




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: