Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These are autoregressive models. When you have a new type of sequence where future elements are able to be predicted from previous parts of the sequence, but in a new kind of way than the models have seen before, it would make sense to finetune.

Admittedly, that's a pretty vague descriptor for how to decide what to do for a given data scenario, but it might be good enough as a rough heuristic. Now, whether knowledge addition falls under that, might be a question of taste (without experiments).



Exactly this. If you have a model that's never seen JSON and you want JSON to come out, fine-tuning probably not a bad idea. If you have a model trained on English documents and you want it to produce English documents related to your company, you don't need to fine-tune.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: