This feels like the typical HN less storage than a Nomad comment. To me this sys...

WnZ39p0Dgydaz1 · on March 31, 2020

Meh, I'll take the bait.

There are no fundamental new ideas in this paper compared to the preceding papers. What they do is to tune hyperparameters (BPTT length, exploration/exploitation tradeoff, and policy parameterization) in a smart fashion as to fit the bottom 5% of Atari games. Obviously the parameters, or equivalently, architecture choices, are tuned to achieve exactly that - good performance at the bottom 5% of Atari games. None of these choices will generalize outside of this specific set of Atari games.

The reasons we are doing badly at these games are well-understood. They typically require "world knowledge" (what is a key? what is a door?) and reasoning (I found a key, that can be used to open a door). That is, the visual representations need to encode such knowledge. Algorithms don't possess this world knowledge as they are not embodied in our world, so they need to learn it from scratch, i.e. brute-force it. That's exactly what this paper is doing - brute-forcing the solutions by finding just the right hyperparameters with millions of hours worth of compute.

A good analogy is what would happen if you took the game but flipped the pixels in some deterministic way so that the screen would look like noise to a human. A key would no longer be a key, but the structure is still the same. If someone asked you to solve Montezuma's revenge with that representation, you would not be able to. Does that make you stupid or non-human? So, because these games require human world knowledge, solving them in the same way as simpler games is kind of besides the point.

KaoruAoiShiho · on March 31, 2020

Thanks for explaining your take but this sounds very reductive. When it comes down to it every problem is solved by tuning for that problem specifically.

While Never Give Up (NGU) is not fundamentally new, it is an important step in computers learning. You need to be able to generalize solutions to problems where you don't have contextual information. Imagine if you were a caveman and asked to operate an iPhone. You're not stupid if you don't know how, but if I tell you, "never give up", and put you in a room for 5 years I'd expect some results from a sentient being. This is an important process too.

We would be much closer to good AI if it can figure things out by itself instead of being constantly fed "clean" data.

WnZ39p0Dgydaz1 · on March 31, 2020

There's a difference between research and engineering. Is this system impressive? Definitely! It's a complex engineering effort, and highly tuned to solve a specific problem - beat the Atari benchmark.

Does it teach us anything fundamentally new? No, it still has horrible sample complexity and does not generalize to anything outside of Atari unless you completely re-tune it. And I don't mean re-train. I mean changing the architecture and assumptions. That's different from projects such as e.g. AlphaZero or MuZero.

IMO this would've been more appropriate to be published as an open-source system so that it can be applied and tuned to other problems as opposed to a research paper. As research, nobody outside of DeepMind can ever reproduce this.

You are completely changing the topic with this:

> While Never Give Up (NGU) is not fundamentally new, it is an important step in computers learning. You need to be able to generalize solutions to problems where you don't have contextual information

We're not even talking about NGU, we're talking about the paper linked in this post. This specific paper proposes little new in that regard. It just engineers a system to do this specifically for Atari games by taking a previous paper and changing some parameters. Neat, but it's not some kind of breakthrough.

visarga · on March 31, 2020

> Does it teach us anything fundamentally new? No, it still has horrible sample complexity and does not generalize to anything outside of Atari unless you completely re tune it.

I thought it was interesting that they let the agent learn the exploration/exploitation tradeoff, also combining memory and intrinsic motivation.

Another contribution of this paper would be that they showed all these tricks can build on each other, thus, are complementary.

Humans brains are also a bag of tricks fine-tuned to the goal and requirements of making more humans.

AndrewKemendo · on March 31, 2020

I think you're both right but are pointing at different ways to approach the intelligence problem. This is kind of the Connectionist vs Symbolic debate.

The fundamental question is, is a representational (contextual) bootstrap required in the long run for a contained computational system to perform at human level across a large number of domains? This isn't a solved problem.

So yes, AI would be better if it could "figure things out by itself" however humans don't "figure things out by themselves" they come pre-wired with a lot out of the box and a lot of help cleaning the data (parents, teachers, literal labels etc...)

silvaring · on April 5, 2020

Wouldn't the 'pre wiring' in this instance be the code and hardware that the algorithm was running on?

visarga · on March 31, 2020

> Algorithms don't possess this world knowledge as they are not embodied in our world, so they need to learn it from scratch, i.e. brute force it.

You mean, almost like a human baby?

MasterScrat · on March 31, 2020

It's not really... They make a very nice and clear summary of the current state of RL, then introduce an incremental improvement by combining existing approaches and throwing a lot of compute at the problem.

Keep in mind DeepMind has a lot of money for PR - but nice prose and diagrams shouldn't affect your judgement of whether something is important or not!

twic · on March 31, 2020

Does anyone still have an iPod?