I'm not sufficiently familiar with the details on ML to assess the proposition m...

sva_ · 2025-07-14T00:34:47 1752453287

> From my understanding, RL is a tuning approach on LLMs,

What you're referring to is actually just one application of RL (RLHF). RL itself is much more than that

physix · 2025-07-14T13:07:44 1752498464

Actually I didn't. Correct me if I am wrong, but my understanding is that RL is still an LLM tuning approach, i.e. an optimization of its parameter set, no matter if it's done at scale or via HF.

sva_ · 2025-07-15T19:03:11 1752606191

RL is a lot more general than that, it is basically a way in which an agent learns to make optimal decisions by learning from experience to maximize rewards. So you can do all kinds of stuff other than finetuning LLMs with it, like controlling a robotic arm, playing/mastering videogames, etc. For example, AlphaGo was also RL.