> The metric is also very human friendly: think of it as 30% of players in the given position, at the given game’s stage and state, successfully continued on to a victory
What I am trying to say is that this is a meaningless piece of information, because that's not how games work. You can't freeze the game state and say "I have a xx% chance to win here", because the state of the game continues evolving over time.
If you were actually estimating your chance to win, it would be a forecast. Except if it's anything like this kind of NN [0], then it is literally just estimating based on a snapshot of game state and not the context of the game. At best, this "win probability" is snapshot estimate of the current game state and has nothing to do with the actual outcome of the game.
This value is also impossible to falsify, so it could be spitting out any random number and you can't say that it's wrong.
> Don't you think it's a bit odd to be forecasting a "win probability" value without taking the events of the game into account?
No. To the extent that these events have no impact on the rules or the outcome of the game beyond the current game state, they don't influence how much equity a player has in the game. You may be interested in "The probability that I win this game given my beliefs about the tendencies of my opponent and my own tendencies, not given optimal play." That's a fine thing to be interested in, but there are other things that can reasonably be called "win probability."
> How do you encode time series or event data into singular values?
I don't know, how do you encode a sequence of 1-byte values into a 1kb string of text? You can read the alphazero paper if you want to learn how a sequence of go game states were encoded.
> To the extent that these events have no impact on the rules or the outcome of the game beyond the current game state, they don't influence how much equity a player has in the game
This is a really odd statement. Past events can most definitely affect events beyond the current game state.
I think the AlphaStar paper [0] has a lot of good examples of my overall points. Here are a few excerpts:
"Central to AlphaStar is a policy [formula], represented by a neural network with parameters θ that receives all observations [formula] from the start of the game as inputs, and selects actions as outputs"
"To manage the structured, combinatorial action space, the agent uses an auto-regressive policy7,10,11 and recurrent pointer network"
I.e. the policy does account for past actions, and these are not encoded in any way.
"The agent parameters were subsequently trained by a reinforcement learning algorithm that is designed to maximize the win rate (that is, compute a best response) against a mixture of opponents"
I.e. the winrate value is _not_ a "win probability", it's a maximization like I said in my original comment.
Do these hold true for discrete action games and/or perfect information games? I don't know, but I find it likely they hold for at least discrete, imperfect information games.
> I.e. the policy does account for past actions, and these are not encoded in any way.
Alright. They should be then.
I misunderstood your objection and I would express a part of it as "the game state is not fully encoded." As an example, my encoding of Splendor gamestates includes which card is secretly in an opponent's hand if the opponent reserved it when it was face-up, and I regard this as "part of the game state," and it seems like the fine article's encoding lacks such information.
On the other hand, this discussion has made me realize my Splendor game state encoding is also lossy compared to the information needed for actual play by experts. If an expert player has 2 red chips, then reserves a face-down rank 3 holding, then takes 2 green chips the next turn, that's very different from if they started with the green chips and began collecting red chips after seeing their secret high-point-value card. My encoding does not account for this and I am struggling to think of how to fix it.
What I am trying to say is that this is a meaningless piece of information, because that's not how games work. You can't freeze the game state and say "I have a xx% chance to win here", because the state of the game continues evolving over time.
If you were actually estimating your chance to win, it would be a forecast. Except if it's anything like this kind of NN [0], then it is literally just estimating based on a snapshot of game state and not the context of the game. At best, this "win probability" is snapshot estimate of the current game state and has nothing to do with the actual outcome of the game.
This value is also impossible to falsify, so it could be spitting out any random number and you can't say that it's wrong.
[0] https://medium.com/analytics-vidhya/a-simple-neural-network-...