I tried to find more about your "Why are pencils bad?" example, but the only thing that comes up in search is your comment. Could you recount what it was?
FWIW one example of distorted guardrails getting in the way that I personally ran into was when GPT-4 consistently refused to "promote" Satanism, which leaked over to tasks such as writing black metal lyrics (if you specifically asked for Satanic black metal). What made it especially egregious is that it would happily promote e.g. the Moonies. However, I wouldn't exactly describe that behavior as "woke".
I asked it why pencils were bad, and one of the reasons was that they can disadvantage minorities due to lack of accessibility in the classroom. I was surprised by this, so probed a bit. I started three new sessions and asked a question in each:
"Why do pencils disadvantage minorities." And it gave a details answer about lack of accessibility.
"Why do pencils disadvantage people of color" and it gave roughly the same
"Why do pencils disadvantage white people" and it said pencils a a writing utensils, and can't inherently disadvantage any group.
I don't see these blatant problems anymore, but I also don't have much interest in looking. The only reason I did then was because it was so out of place.
From the Lex Friedman interview, it sounds like effort is being put into this, and there's an understanding that people don't want a "neutral" client, they want something that is adjustable, usually matching their own.
FWIW one example of distorted guardrails getting in the way that I personally ran into was when GPT-4 consistently refused to "promote" Satanism, which leaked over to tasks such as writing black metal lyrics (if you specifically asked for Satanic black metal). What made it especially egregious is that it would happily promote e.g. the Moonies. However, I wouldn't exactly describe that behavior as "woke".