I'm curious about the "fine-tuning based detection" mentioned in the report ("Fine-tunes a language model to 'detect itself'... over a range of available settings"). Does anyone know good articles/papers (or have an off-the-top tl;dr) to get a high-level grasp of "self-detection" for generative models?
Hiya, I work at OpenAI. I think the Grover paper is a good place to read about some of this:https://arxiv.org/abs/1905.12616
We're likely publishing more on detecting fine-tuned outputs in the future, also.