I'm in this research area, it's facinating, but also (like many things in AI) easy to get wrong.
The trickiest bit is deciding exactly what you want your AI to optimise for -- If you want a game where "anyone can win", you end up with coin toss. If you want a game where "skill is everything", you can accidentally end up with "who can click fastest".
On a project I recently worked on (and I'm writing up), we ended up on our 6th or 7th "optimisation function" before we found something where the "AI optimal" seemed actually enjoyable to play.
Of course, if you already have a fairly fixed game, it can be easier to optimise constants, as you already know your target -- but people have also been doing that for a long time previously.
Personally, i'd like to see AI used for "simple" things, not often tied to the core gameplay loop itself. Ie conceptually if you could tie the right gameplay components to an RNG function then the user could get new/unexpected experiences for a longer time. The problem is RNG often makes crap outputs.
I'd be curious to see ML take this problem. I imagine the examples are limitless, but one that comes to mind might be procedural worlds. Rather than tuning procedural worlds around seed values, what would it look like if we try to get an ML to craft the worlds? Procedurally generated worlds often have that feeling. Things don't make sense, etc. I imagine ML could make some far more diverse patterns that are both interesting and fun.
I'd like to see this pattern of "informed RNG" in a lot of things. Skyrim with NPC patterns to feel a bit more authentic. Etc.
I'm less interested in the classic AI dreams inside games, as it seems far out and less gamey. But our current games with informed RNG? That sounds.. neat.
I think you are right. There are SO many small mini games inside of nearly all RTS games which AI has yet to actually conquer, I will give another simple example, given a random map (and resources) identify chokepoints. As a human this is a huge clue on how to go about a game strategy (policy) but I have yet to find an AI do this (or optimize that part). From what I gather most time is spent on actually optimizing random build orders and modifying them along the way.
+1 among my favorite aspects in certain rts games.
My favorite case is empire earth, where there’s randomness in resource clustering as well as natural terrain. Depending on your strategy/units (and your opponents’) it’s increasingly grey whether a locale is even a choke point.
The homogeneity of, for example, Starcraft maps and limited unit options always made me feel like we were playing on an excel spreadsheet.
This sounds very familiar to the game I play 0ad (https://play0ad.com/), you have to actually find an enjoyable way to play RTS games but in this game I pick myself vs. 7 AI with a random map, you never get the same allies and how you develop a strategy around the market (ie. chokepoints) is a good 20% of the game. I just wish the AI decisions on diplomacy weren't so random but it sounds like that mini game will soon be solved, https://spectrum.ieee.org/tech-talk/robotics/artificial-inte...
ML is a system that takes a large data set, and an error function, and finds a generated output that minimizes the loss. What data set and error function are you proposing for "RNG"?
That's a wrong simplification of ML. Take RL for instance.
And the parent already explained a concept: generating maps. It could still have an RNG as the base (noise function over something), but then use ML to place elements based on existing human-made maps.
>> The trickiest bit is deciding exactly what you want your AI to optimise for -- If you want a game where "anyone can win", you end up with coin toss. If you want a game where "skill is everything", you can accidentally end up with "who can click fastest".
In Magic: the Gathering this balancing act is achieved by a set of constantly changing "environments" (sets of cards) that are updated every few months, when a new set of cards is published. Sets "rotate" in and out of the various environments and there's a few where it's legal to play with all cards ever printed (excluding some that are banned or restricted for being OP- Over Powered).
When a new set rotates (its cards become legal) in a given environment, there will be a reshuffle of the balance of power between already-established decks, that now acquire new cards or lose cards they needed, and some new strategies become available resulting in new decks being designed. Eventually the dust settles and the "tier one" decks (the ones that win the most) are found. Then tournament play in particular becomes a rock-paper-scissors affair, until the next update.
In some of the environments were all cards from all sets are legal (again, minus the ones in the banned or restricted lists) games indeed often come down to a coin toss: the player who goes first wins (in one turn, through some OP combo, typically).
I suspect that a perfectly balanced game is impossible to have and would probably be boring even if it was possible to have it. An element of creativity with constantly updated design parts, like in M:tG is probably the best one can do.
Perfectly balanced games are plentiful, take rock, paper, scissors as an example.
What makes those games interesting is the concept of Yomi[1] and learning to read an opponent. There's a decent amount of literature in the fighting game space and a bunch of other genres overlap as well.
I remain pretty skeptical of ML being a primary tool here since a lot of Yomi is psychological and not necessarily an optimization problem.
I feel like some of these questions are a bit backwards. Isn't it better to give the model an arena to play within and try to make all of the options _viable_? I.E. Balance is maintained when the average of the weapons the gladiators choose to take into the arena are diverse and have less bearing on the win outcome?
This means that rather focusing on what is merely broken one is focusing on both ends, what is broken and what is useless, attempting to push both towards an average of usability.
Personally it astonishes me that groups like WOTC (who produce MTG) don't appear to be using such models given the quantity of mistakes they seem to continue to make in game design.
Attempting to optimize the card generation process, or deck building, or just play itself seem like unbelievably difficult problems. Some effort at least has been made in the first part (see RoboRosewater) but the other two seem almost intractable, given the combinatorial explosion of possibilities in deckbuilding, the incredibly nonlinear interactions between cards (Splinter Twin is a 1/10 card, Pestermite is maybe 3/10, Splinter Twin + Pestermite literally wins the game on the spot).
For something like a deck building game, if all decks are viable (competitive) then there may as well be no deck building component. In fact a game like that is probably a pretty muddy brown color in terms of card variety. That would adversely affect the business model of selling new cards as well.
Desirable qualities in a game in terms of how it is balanced are way more complex than just every option being viable and leading to an even win rate against a similarly skilled opponent.
Interesting! Is there some kind of meta-objective for "AI optimal" that could have replaced the 6 or 7 iterations you did with human R&D? For instance, if you had real human playtesters interacting with the prototypes, is there some signal you could extract to measure that it's "good"?
The problem is AIs are very good at optimising what you asked them for, rather than what you meant to ask for, and figuring out what you want is super hard :)
As a simple example:
* Start by optimising "players can always do something on their turn" -- but that just ends up with everyone always having exactly one thing they can do (no choice).
* So then say "give players more things to do each turn" -- but then they end up being able to do everything every turn (the game gives them too much 'money' (still not really a choice)
* OK, so we want to force players to make a choice -- so we say "No, give players as much choice as possible, but make sure if they choose an option it blocks off others (in practice, make as many sets of maximal tasks as possible)" -- but then the AI will make sure every turn every player can do (for example) exactly 3 out of 6 things (any 3), and make sure no matter how well or badly they play they still always get to choose 3 from 6, so the game doesn't really progress, or vary.
So, what we want is choice, but also variability, and progress, and players to feel like they are effecting the game, but also don't let one player run away too early, but also don't make it just "feel random who wins", etc.
Got it. I do research in reinforcement learning and I can sympathize with the difficulty here - in my experience, even something as simple as "I want to balance two objectives: 1) the agent should get a high score and 2) the agent should try to make as few decisions as possible" tends to result in the agent doing neither of those things well"
Letting these ML agents loose on your game is also a good way to discover bugs in the implementation that a human player might never run into. When I first messed about with RL[1] I quickly discovered this, with my 'AI' learning to abuse invalid moves to get a re-roll of the dice or to stay put when it should have been forced to move.
[1] - https://datasciencecastnet.home.blog/2020/01/24/swoggle-part...
This sort of research is maybe less flashy than say using machine learning to automatically generate game assets from photos, but I think this sort of computer-aided game design is possibly the biggest way machine learning will transform video games. As games are becoming bigger and more complicated, the problem of tuning various gameplay parameters explodes exponentially. And, this sort of tuning can have a huge effect on player retention and overall game quality.
In this research the machine learning is being used to balance the game across different asymmetric strategies (different decks in the card game), but you could imagine using similar techniques for balancing and tuning content for single player games as well. Once you have a reasonable model of the player's behavior, you can do all sorts of automatic tuning like balancing the difficulty of jumps in a platformer, tuning enemy positions in an FPS, etc.
I have a bone to pick with these "win probability" charts.
<rant>
If it was truly a "win probability" chart, that means it's a forecast. Except it's a shit forecast because you're trying to predict really freaking far into the future (You don't even know how far because the game could end at any point in time).
It also makes zero sense. Think about it, I say "you have a 30% chance of winning the game from this position". What does that even mean? If I play well I have a 30% chance of winning? If my opponent is of equal skill I have a 30% chance of winning? It's completely uninterpretable.
There's a good reason for this though. It's not measuring "win probability", it's just maximizing the value of "winning" (I.e. predicting output = 1) given the current state.
They even admit this in the article!
"In addition to making decisions for the game AI, we also used the model to display the estimated win probability for a player over the course of the game"
It's supposed to be a vague and un-interpretable value because it's generated by a black box neural net. So why do we continue to pretend that this is a human-friendly value?
Like any metric, it's not perfect, but there's a lot of good information there.
Dota added a win prediction graph a while back that's been valuable to look at. It's not really a black box, because you can see the win % change over time and get a feel for what it's weighting. For example, it takes character selection into account, and the initial prediction can get to 60/40 before the game even starts.
The bit that I found the most interesting is when the prediction doesn't line up with my intuition of the game state. I've had a couple of comebacks from a predicted 1% win chance, and all those felt more like 20-30% comebacks, where we were down but definitely not out. A map is not the territory, and that's easiest to see when you fall off the edge of the map.
Do you know about STRATZ? They also have a win probability chart on their website as well as predictions for each hero in the game.
In their Discord, different people have asked the same questions about these "win probability" values multiple times because often they're very unintuitive (Because they are not actually probabilities).
> It's not really a black box, because you can see the win % change over time and get a feel for what it's weighting
If it wasn't a black box you would know why it's weighted at a particular value.
I think it's mostly a curiosity and not that helpful if you're trying to analyze your games.
The neural net is a well-researched cnn variant - the activations are quite interpretable these days.
The metric’s interpretability is independent from the neural net used for gameplay, so the black box comment doesn’t make sense (unless I’m misunderstanding the comment...).
The metric is also very human friendly: think of it as 30% of players in the given position, at the given game’s stage and state, successfully continued on to a victory.
> The metric is also very human friendly: think of it as 30% of players in the given position, at the given game’s stage and state, successfully continued on to a victory
What I am trying to say is that this is a meaningless piece of information, because that's not how games work. You can't freeze the game state and say "I have a xx% chance to win here", because the state of the game continues evolving over time.
If you were actually estimating your chance to win, it would be a forecast. Except if it's anything like this kind of NN [0], then it is literally just estimating based on a snapshot of game state and not the context of the game. At best, this "win probability" is snapshot estimate of the current game state and has nothing to do with the actual outcome of the game.
This value is also impossible to falsify, so it could be spitting out any random number and you can't say that it's wrong.
> Don't you think it's a bit odd to be forecasting a "win probability" value without taking the events of the game into account?
No. To the extent that these events have no impact on the rules or the outcome of the game beyond the current game state, they don't influence how much equity a player has in the game. You may be interested in "The probability that I win this game given my beliefs about the tendencies of my opponent and my own tendencies, not given optimal play." That's a fine thing to be interested in, but there are other things that can reasonably be called "win probability."
> How do you encode time series or event data into singular values?
I don't know, how do you encode a sequence of 1-byte values into a 1kb string of text? You can read the alphazero paper if you want to learn how a sequence of go game states were encoded.
> To the extent that these events have no impact on the rules or the outcome of the game beyond the current game state, they don't influence how much equity a player has in the game
This is a really odd statement. Past events can most definitely affect events beyond the current game state.
I think the AlphaStar paper [0] has a lot of good examples of my overall points. Here are a few excerpts:
"Central to AlphaStar is a policy [formula], represented by a neural network with parameters θ that receives all observations [formula] from the start of the game as inputs, and selects actions as outputs"
"To manage the structured, combinatorial action space, the agent uses an auto-regressive policy7,10,11 and recurrent pointer network"
I.e. the policy does account for past actions, and these are not encoded in any way.
"The agent parameters were subsequently trained by a reinforcement learning algorithm that is designed to maximize the win rate (that is, compute a best response) against a mixture of opponents"
I.e. the winrate value is _not_ a "win probability", it's a maximization like I said in my original comment.
Do these hold true for discrete action games and/or perfect information games? I don't know, but I find it likely they hold for at least discrete, imperfect information games.
> I.e. the policy does account for past actions, and these are not encoded in any way.
Alright. They should be then.
I misunderstood your objection and I would express a part of it as "the game state is not fully encoded." As an example, my encoding of Splendor gamestates includes which card is secretly in an opponent's hand if the opponent reserved it when it was face-up, and I regard this as "part of the game state," and it seems like the fine article's encoding lacks such information.
On the other hand, this discussion has made me realize my Splendor game state encoding is also lossy compared to the information needed for actual play by experts. If an expert player has 2 red chips, then reserves a face-down rank 3 holding, then takes 2 green chips the next turn, that's very different from if they started with the green chips and began collecting red chips after seeing their secret high-point-value card. My encoding does not account for this and I am struggling to think of how to fix it.
> Think about it, I say "you have a 30% chance of winning the game from this position". What does that even mean? If I play well I have a 30% chance of winning? If my opponent is of equal skill I have a 30% chance of winning? It's completely uninterpretable.
It means that the AI that generated the data has 30% chance of winning, or at least that's what it's trying to estimate.
That doesn't make sense in the context of a game though. You can't freeze the game and say "I have a xx% chance to win here", because the state of the game continues evolving over time.
The generated value is based on a snapshot of the game and isn't related to the final outcome at all, so I believe saying it's a "win probability" is completely wrong.
If I have pocket aces and you have 7-2 off-suit, does a win probability exist for my aces against your 7-2?
If Ryu is knocked down and he has to dragon punch to beat a meaty command grab and has to throw to beat an opponet who is attempting to block his dragon punch, and both players are at very low health, and these two options dominate all other options, such that the entire game hinges on a single round of a double-blind guessing game with only two possibilities, does a win probability exist for Ryu?
You have created 2 scenarios with very small actions spaces. Most games have enormous action spaces, so you can't easily compute a probability.
Take Chess for example. Chess engines evaluate millions of scenarios for every move. They don't generate a "win probability" value though, they evaluate board positions.
> Most games have enormous action spaces, so you can't easily compute a probability.
Saying you can't compute something by hand on a single sheet of paper is different from saying it doesn't exist.
In Chess, your win probability is either 0 or 1 (you either can force a win or you cannot), but most chess engines have a real-valued estimate of their equity in the game that is between 0 and 1.
Why do you think it's unreasonable to interpret the value function as probability? I think chess engines don't do it because of draws, but you could look at a chess position and estimate what are your chances of winning/drawing/losing.
If the probability of winning was static I would agree that the evaluation function is (Or can be) a prediction of win probability.
However, because the probability of winning is clearly dynamic over time, trying to predict that probability is inherently a future prediction of the outcome (I.e. a forecast) rather than the prediction of a value.
In that case, it does not make sense to view the evaluation function as probability because a forecast is not the same thing as a probability.
I'm looking at this from the perspective of the game. From the perspective of any given state in isolation I agree that it makes perfect sense, but in the context of a game it doesn't.
This is why I think that plotting all these values on a chart together and calling it "win probability" is non-sensical. It does give you an idea of the game state, but it's not a probability.
What if instead of spending time to train the ML model they just made a dummy client with it’s own simple probabilistic state machine or behaviour tree to balance the game?
How much time and resources would be spent for above mentioned approach compared to ML approach?
I think due to statistical nature of ML, it is seen as kind of an hammer to every problem that might be solved statistically(which there are lots of them) but it might not be the most effective use of engineering resources.
Anecdotally: I've never seen a "simple state machine" or "simple behaviour tree" in game AI. We've recently started using deep reinforcement learning for our games and its almost like a miracle how simple, effective and scalable the system is. There are some mentioned problems like designing rewards for player enjoyment, but its definitely got a massive reduction in engineering effort.
> Anecdotally: I've never seen a "simple state machine" or "simple behaviour tree" in game AI.
That's effectively what game AIs are, today. Users want an AI they can model and simulate in their head, and isn't too brutal of a challenge. Today's machine learning cannot provide a model like that.
The whole point of this research is to early study on unknown aspects of game's design and its consequence on user behaviors and game balance. AI should be able to find and exhibit game play unexpected to the designer, which is not easily achievable with a tight control given to game designers. It's more of systematic state space exploration, not making AI fun to play with.
What is really interesting to me is to train the game models from the existing player base.
This is fairly common but more simulation replaying in networked games with fake bots based on that gameplay. Previously I have done similar in networked games where the way points and reactions are reused in AI agents that mimic players. The networked players that we captured, if another player dropped, we replayed captured ghost plays essentially with more interactivity that seem more real than just AI. This allowed us to have networking game experience of matchup, that can drop a player, but seemingly did not but still have that player act human not just bot-like. This is now used in all our matchup/networking games because it makes the matchup/networked experience better for the player even if they end up playing a sim and not knowing it.
Taking that further if you could train the machine learning models from the existing player base for an additional level of focus it could really make the game fun and you could use it to even drive player behavior.
Machine learning definitely has a place in game testing and progression tuning. However, even more it would be nice if it can learn on the fly and not end up in predictable AI, one possible way would be capturing actual human player behavior and modeling that.
The approach as described has one major downside. It is based on presumption that human players will employ the same strategies as a neural network which learned to play the game. It may be the case, but in reality many imbalances in the game model remain undiscovered or unused by the real players, for various reasons.
The good example of ML for playtesting is what King is doing with their Candy Crash Saga. They have trained neural network on real world usage data, from millions of players. That makes it behave like a real player too, not pathetically weak, and not inhumanely strong. This, if applicable for your game, is a better way to leverage ML.
There are other examples of neural networks finding highly unorthodox strategies in various games, when they learning to play it. It is nothing like human behaviour.
For the actual game state representation that the model would receive as input, we found that passing an "image" encoding to the CNN resulted in the best performance, beating all benchmark procedural agents and other types of networks (e.g. fully connected).
It sounds like they literally decompose the game screen into inputs. E.g. in FPS games, your health is often displayed in the exact same spot; it passes a screenshot of that into the network.
Or maybe not. Maybe they decode the game state into an "image" in the sense that health is represented as a single pixel that ranges from RGB 0,0,0 to RGB 255,255,255. That would make more sense, but it's also slightly less exciting. Theoretically the model should be able to infer what health means simply by having enough experience, even if it's a more complex representation like Arabic numerals rather than a handcrafted input.
Anyone know if there are any other details, like model weights (ha ha, not likely), an architecture diagram, a paper, or some code snippet from some prototype that the researchers used for inspiration? The "image" representation is really quite interesting to me, since I hadn't thought about feeding data to networks that way. Theoretically a GAN could learn a thing or two about the world from having this sort of "image" input too.
> It sounds like they literally decompose the game screen into inputs (...) Or maybe not. Maybe they decode the game state into an "image" (...)
The GIF above "An example game state representation used to train the neural network" makes it seem like it's the latter.
Some of the variables have an unexpected representation. For example to represent "chimera health" with value H, it seems like they just use the 3 rows at the top, with the first H pixels "on" (green) and the remaining "off" (black). Same thing for "Chimera Power" and "Link Energy".
I guess this might make sense since they are using a CNN, but I wonder why that works better than using a different architecture and passing each of these values as a single input.
This is super cool, but like the other deepmind projects it's a bit frustrating. I've got nothing against "we did something hard with AI and it worked well", but would greatly prefer it to be followed with "... and we're making this available as something you can use too!"
I'd love to see this type of work applied to 4X grand strategy games where the AI has historically been pretty terrible and forces the developers to ramp up difficulty by giving the AI direct material advantages for a skilled player to overcome.
Yes please! I yearn for a CiV VI AI that can play competently through all of the eras without cheating. It's a masterpiece of a game, but the AI can't keep up with the mechanisms introduced.
From the Game Developers Conference (GDC 2018) the talk covers a bit of the story of building on the work of a NN playing Atari-era games in order to enable playing a modern AAA FPS (First Person Shooter).
The most obvious use of ML in this context would be to predict player actions in multiplayer games. Most such games extrapolate the actions of remote players in order to give the illusion that there is no lag between updates. I bet for many situations, a good ML implementation could make this work almost flawlessly. No idea if anyone is doing this yet.
Also, the headline is wrong: It's 'Leveraging Machine Learning for Game Development'.
Cool to see this. I and surely many others in game development independently thought of doing this 5+ years ago. Fascinating to think that no matter how much software the world writes, we will keep finding entirely new ways to add value by writing even that much more code.
People always pick these strategy games to use AI on. I would love to see someone do more of a first person shooter, or a Mordau type game. I'm sure it would crush real players but would be interesting to see two AI's compete.
I'm looking forward to the next installment "Machine Learning Machine Learning for game development," the blog article written instructing general AI on how to generate better ML models for video games.
I got a introduction to ML back in college (going through basic Algorithms) and I went straight to use them in game dev. I think it's a fascinating area.
The trickiest bit is deciding exactly what you want your AI to optimise for -- If you want a game where "anyone can win", you end up with coin toss. If you want a game where "skill is everything", you can accidentally end up with "who can click fastest".
On a project I recently worked on (and I'm writing up), we ended up on our 6th or 7th "optimisation function" before we found something where the "AI optimal" seemed actually enjoyable to play.
Of course, if you already have a fairly fixed game, it can be easier to optimise constants, as you already know your target -- but people have also been doing that for a long time previously.