If I understand correctly, the older method you describe has been replaced by exposing a GPT model to some further training (as opposed to "pre-training") with successful conversations. I think this premiered with the InstructGPT paper: https://arxiv.org/pdf/2203.02155.pdf