Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How could GPT-3.5 possibly have been a finetuning of the 175B model? They didn't even use the same tokens?


Finetuning might not be the best word; sometimes it is a grey line.

Token embeddings can be trained without changing the other parameters. There is a number of models which add tokens as a finetuning step. Here is recently StarCoder adding ChatML-equivalent tokens: https://huggingface.co/blog/starchat-alpha#a-standard-format...


Sure, you can add a few tokens, but in this case they changed almost every token.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: