The idea of reinforcement learning is that for some things it is hard to give an...

The idea of reinforcement learning is that for some things it is hard to give an explicit plan for how to do something. For example, many games. Recently, DeepSeek showed that it worked for certain reasoning problems too, like leetcode problems.

Instead, RL just rewards the model when it accomplishes some measurable goal (like winning the game). This works for certain types of problems but it’s pretty inefficient because the model wastes a lot of time doing stuff that doesn’t work.