Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As someone who has implemented quite a few reinforcement learning techniques and seen their limitations, I would be surprised if RL could overcome handcrafting for SC any day soon.


The main way the AI bots have problems is with timing. The neutral networks used have no way of encoding time dependent actions in a reasonable way. (As opposed to say a fuzzy decision tree with explicit time input.) And if you try to explicitly include it, curse of dimensionality strikes back hard.

Both absolute and relative timing have to be handled. And relative since specific salient action...

Plus the real reward is very sparse. Say, crippling mineral production early may or may not snowball. Likewise being a unit or two up...


What that tells me is that they haven't yet come up with the right featurization - that is, the function that maps input data into the actual neural network node values. The appropriate featurization would include the time information but reduce its dimensionality by hard-coding some basic assumptions, of the kind that humans presumably make when processing the same data.


I think these guys (and most people using deep models) try to avoid hand-crafted features as much as possible.


Gabriel (as well as the others on the team) have definitely looked at these areas - if things were left out/not "featurized" it was likely done via an ablation test, or showed improvement over benchmarks, or maybe just to set a baseline, as he is quoted in the main article. I don't know what techniques they used here, but I am excited to find out!

On the specific issue of encoding time-dependent behaviors in models, I think it is related to a broader issue that shows up in many application areas. To me the critical factor is that these models are ruthlessly good at exploiting local dependencies and totally forgetting long-term global dependencies or respecting required structure in control/generation.

This basically means it is very difficult to train long-term, time dependent behavior without tricks (early/mid/late game models, extensive handcrafting of the inputs, or using high level "macro actions"). Indeed, FAIR's recent mini-RTS engine ELF directly gives macro actions, in part to look closer at how well global strategies are really handled and remove one factor of complexity [0].

Gabriel's PhD thesis was entirely on Bayesian models for RTS AI, applied to SC:BW [1], so I am sure he is well aware of the "classic/rules based" approaches for this.

[0] https://code.facebook.com/posts/132985767285406/introducing-...

[1] http://emotion.inrialpes.fr/people/synnaeve/phdthesis/phdthe...


Alphago used several hand-crafted features as of the Nature paper, so DeepMind at least is not above a little feature engineering.


I suspect you might be able to do surprisingly well with just a few simple features, e.g. what did I last see at each position and how long ago was that, how many of each enemy unit have I seen simultaneously and at what time, etc.

As to the sparsity of reward, I'm not sure this is such a big problem. Once the AI learns that e.g. 'resources are good', it can then learn how to optimize resource production. You could even give the process a head start by learning a function of time+various resources+assorted features to win rate from human games to use as the reward function.


"Resources are good" doesn't really mean anything.

Yes resources are good, but how do you know when to expand?

Judging from opponents movements, you can tell if they're turtling, going for some cheese strat, or doing some build where they may not be able to respond to a aggressive expansion.

Of course if you choose wrong, you lost the game.


Why do you say that? The dota2 bot open ai did earlier this year seemed pretty convincing and similar...


Starcraft is a much larger, more complex, more freeform game than Dota 2. It's like Go compared to chess.


I disagree with this (I used to play Warcraft 3, and currently play Dota 2), but that's beside the point. The Dota 2 OpenAI is only set up for one mirror matchup (impossible in real games) involving one hero on each side, in one lane, and only for the first 10-ish minutes. This is maybe 1% of a real Dota game.


I think you are both correct. Starcraft has a far larger space of verbs at any given moment, and many of them can impact each other, giving it one form of complexity, while Dota2 clearly has a much more complex set of units and abilities, leading to more possibilities total, even if the number of possible actions moment to moment are more limited. But yeah, the bot was a teeny little bit of the game, impressive as it was.


If DOTA is anything like league I'm not sure I agree completely. I think in league there's more 'future prediction' needed, i.e. the current state is less immediate than in star-craft. In star-craft you can quickly see who is winning but in league there are things such as pushing lanes to consider and knock-on effects from later back timings (I know dota doesn't have backing but it has couriers?).

While that all can be extrapolated from current state I think starcraft is much easier to go for immediate gains by destroying more supply/resource value of units and extrapolate from there.


Starcraft strategy has a lot more weird nuance to it.

I noticed this building in this position at this time and I haven't been attacked by X unit yet, so he's probably doing strategy Y. I better skip some unrelated building I was going to make, so I can have an extra unit Z in case he's doing that strategy. Then I'll place the units at a particular spot to try to trap him because that unit will be vulnerable in this other spot so he's unlikely to move through that spot.


A SC:BW bot will have the ability to be perfectly aware of every unit in vision at all times.

It wouldn't be a suprise if some research team could put out a bot achieving superhuman victories purely by out-microing an opponent with minimal strategic choices.


>It wouldn't be a suprise if some research team could put out a bot achieving superhuman victories purely by out-microing an opponent with minimal strategic choices.

Yeah they did pretty much that. But the problem is it's a very brute-force approach and violates some rules of the game.

They jam thousands of commands per second into the game, and give each unit its own rudimentary AI. The units basically just dance at maximum range, magically dodge hits, etc.

If they limit it to 600 actions per minute (10 keystrokes hitting the keyboard every second - still beyond the human mind but beyond human fingers) it becomes a much harder AI problem.


Yeah, for people unfamiliar with starcraft bw: Whereas in other strategy games you may be able to improve the effectiveness of a unit 2-3x by micromanaging your units perfectly, in bw microing certain units perfectly can improve their effectiveness by something like 100x.

In the case of certain unit matchups, say, zergling versus vulture, the vulture should be able to kill an infinite number of zerglings given that it is microed correctly. However, despite the zergling being useless against a vulture on paper, In a human game you just don't have enough time to babysit your vultures with everything else going on so you end up seeing zerglings being used against vultures somewhat cost effectively even at professional levels.


>The units basically just dance at maximum range, magically dodge hits, etc.

While it certainly isn't fair to play against, it does have a certain elegance[1].

There's also the problem that even if it's AI vs AI, the races and units are balanced around reaction times of humans.

[1] https://www.youtube.com/watch?v=IKVFZ28ybQs


> It wouldn't be a suprise if some research team could put out a bot achieving superhuman victories purely by out-microing an opponent with minimal strategic choices.

The chess equivalent would be letting Deep Blue take 10 years to evaluate each move; it's not a very interesting system anymore since it isn't playing under normal rules (~90 minutes per turn).

Any "real" SC AI will have limitations on input, say 300 actions per minute. It'd be pretty interesting to see how few actions per minute an AI could use to defeat the top human players.


>The chess equivalent would be letting Deep Blue take 10 years to evaluate each move;

Even worse and less interesting - it's a bit like allowing the computer to move two pawns in each turn.


And then the mind games begin.


OpenAI was playing 1v1


A lot of complexity in dota comes from the interactions between 10 players. Make it 10 ai having to communicate using chat. Make them pick and ban. Make all objects available. And then you'll have real complexity.

With hundreds of unit x objects, jungling, roshan, cd and pick + ban, you can actually get at the sc level of complexity.


It's 10 players vs ~200 bots.

So, SC is still a much more complex space. DoTA has non player bots, but they are similar to SC buildings and follow very simple rules.


You can't compare the complexity of a dota char with a unit. Sc units are very basics, they don't have xp, don't have a skill tree, don't have have 100 of possible objects and their skills don't vary so much in context.


This matters far less than you might think. Go has simpler peaces without movement and it's still much more complex than chess.


This is more of an artifact of the size of the game board than anything, I think. 9x9 go is decidedly simpler/easier than chess, and I expect chess on a 19x19 board with the number of pieces scaled proportionally (each player starting with, say, around 90 pieces) would be a lot more difficult to play/analyze than a standard go game.


Well sure, but that's the exact same issue as you see in SC2 vs DoTA. In SC you are simply dealing with a vastly more complex state.


Not in Dota.

In dota combinations open the road for brand new moves. Some items tp, some regenerates, some cancel buff, some critics, some cleave, some cut trees, some slow, some stun, some give visions, etc.

Now in SC, you have 3, 4 main builds for a given match up. You see the building, and you know where this is going.

In Dota, depending of the 10 heroes, current money and objects combination, and player skills, you may expect one build or another.

Also, a zergling or 10 zergling is pretty much the same the same to consider from the behavior point of view. The number doesn't matter that much, only the intensity of the effect. And a gling will alway do the same thing. Move. Attach. Burrow.

The same unit in Dota can have a completly different role depending of the context.

My guess is that an AI would give you a much bigger advantage in SC because they can make more APM than a human, strat or not, while on Dota at high level strat is more important on the long run.


SC 2 has ~100's of viable strategy's per race as openers, but like chess openers are just that. As the game evolves you get complex iterations and risk reward situations. Bluffing research is very much a part of high level play as min maxing requires you to force the other player to waste resources in any way possible.

For example larva are one of Zerg's most valuable resource and there are several ways of attacking that resource by killing units or simply forcing them to go more defensive.


Didn't they restrict the game to a single hero and a subset of all items to make it tractable for AI? And didn't people find a way to break that bot?


It was restricted to one very artificial game mode (1 vs 1 mid, Shadow Fiend mirror matchup) that is not representative of real games, but is good for practicing one aspect of Dota. Some items in this game mode are banned in order to make it less safe for both sides, and so that there is a better chance of one side dying before ten minutes. People have beaten the bot by both "breaking" it and by just outplaying it (but the latter very, very rarely).


> And didn't people find a way to break that bot?

According to the player interviews and Reddit discussion threads, the "break" you are talking about was more like being really unpredictable therefore finding a play style that the AI had never encountered.

The players were flailing to find a way to defend against the AI that is learning quicker over time than they are.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: