> For example, imagine a television displaying nothing but static on its screen. Such a thing would quickly engage the curiosity of a purely novelty-seeking agent, because a square of randomly flickering visual noise is, by definition, totally unpredictable from one moment to the next. Since every pattern of static appears entirely novel to the agent, its intrinsic reward function will ensure that it can never cease paying attention to this single, useless feature of the environment — and it becomes trapped.
> To discourage the controller from focusing on truly unpredictable, random inputs (such as uninteresting details of white noise), later approaches model the expected progress of the predictor: parts of the world where the predictor fails to learn (no data compression progress!) become less interesting than those where its predictions improve.
This seems like a better/more general solution than the one presented in the article, since the agent could otherwise get stuck with noise over which it does have causal influence. The agent could even generate, using its actuators, noise that is unpredictable and chaotic, but not interesting.
There is definitely other research that has to be mentioned in this field.
Most prominently Karl Friston's work on free energy. It's a proper Bayesian formulation of optimizing for surprise, in this case minimization.
The challenge is randomness in actions or environment.
In a deterministic environment it might make sense to maximize surprise, however in a stochastic environment it makes sense to minimize it. You don't want a robot that navigates to those parts of space that are totally chaotic.
Surprisingly, by minimizing surprise an agent can develop knowledge of the world on the long term as long as that world is dynamic enough. There are no dark rooms to retreat.
Ralf Der argues for something similar from a dynamical perspective: https://www.informatik.uni-leipzig.de/~der/. It's about minimization towards (a set of) equilibrium states.
It seems to me that if they can crack that, it's basically game over. They'll just implement it into a robot (either real or simulated) with cameras and arms, throw toys at it, and let it learn by itself like babies do.
We humans learn how to walk or keep balance by falling. It would be even fun to constantly fall; however there is pain involved, so we learn to do anything NOT to fall.
I think the first one who will design simple 2-leg machine and lock it in the room for a year and let it learn whatever it wants with simple objective "stay tall, don't fall" will be able to walk Boston Dynamics machines on a leech to a local park.
> This abstraction incorporates only features of the environment that have the potential to affect the agent (or that the agent can influence)
I like the parallel between this and various courses I had about working efficiently / not getting too stressed by only focusing on what you can influence.
Reminds me of http://people.idsia.ch/~juergen/interest.html
> To discourage the controller from focusing on truly unpredictable, random inputs (such as uninteresting details of white noise), later approaches model the expected progress of the predictor: parts of the world where the predictor fails to learn (no data compression progress!) become less interesting than those where its predictions improve.
This seems like a better/more general solution than the one presented in the article, since the agent could otherwise get stuck with noise over which it does have causal influence. The agent could even generate, using its actuators, noise that is unpredictable and chaotic, but not interesting.