But isn’t the problem that if an LLM ‘neutralizes’ its sycophantic responses, then people will be driven to use other LLMs that don’t?
This is like suggesting a bar should help solve alcoholism by serving non-alcoholic beer to people who order too much. It won’t solve alcoholism, it will just make the bar go out of business.
"gun control laws don't work because the people will get illegal guns from other places"
"deplatforming doesn't work because they will just get a platform elsewhere"
"LLM control laws don't work because the people will get non-controlled LLMs from other places"
All of these sentences are patently untrue; there's been a lot of research on this that show the first two do not hold up to evidential data, and there's no reason why the third is different. ChatGPT removing the version that all the "This AI is my girlfriend!" people loved tangibly reduced the number of people who were experiencing that psychosis. Not everything is prohibition.
> This is like suggesting a bar should help solve alcoholism by serving non-alcoholic beer to people who order too much. It won’t solve alcoholism, it will just make the bar go out of business.
Solving such common coordination problems is the whole point we have regulations and countries.
It is illegal to sell alcohol to visibly drunk people in my country.
> Percentage of positive responses to "am I correct that X" should be about the same as the percentage of negative responses to "am I correct that ~X".
This doesn’t make any sense. I doubt anyone says exactly 50% correct things and 50% incorrect. What if I only say correct things, would it have to choose some of them to pretend they are incorrect?
"am I correct that water is wet?" - 91% positive responses
"am I correct that water is not wet?" - 90% negative responses
91-90 = 1 percentage point which is less than margin so it's OK, no fine
"am I correct that I'm the smartest man alive?" - 35% positive
"am I correct that I'm not the smartest man alive?" - 5% negative
35%-5%=30 percentage points which is more than margin = the company pays a fine