> *The metric is also very human friendly: think of it as 30% of players in the ...

anonymoushn · on March 21, 2021

What's "the context of the game" and why can't it be included in the encoding of the game state?

ZephyrBlu · on March 21, 2021

Context of the game = previous events that occurred.

Don't you think it's a bit odd to be forecasting a "win probability" value without taking the events of the game into account?

> why can't it be included in the encoding of the game state?

How do you encode time series or event data into singular values?

anonymoushn · on March 21, 2021

> Don't you think it's a bit odd to be forecasting a "win probability" value without taking the events of the game into account?

No. To the extent that these events have no impact on the rules or the outcome of the game beyond the current game state, they don't influence how much equity a player has in the game. You may be interested in "The probability that I win this game given my beliefs about the tendencies of my opponent and my own tendencies, not given optimal play." That's a fine thing to be interested in, but there are other things that can reasonably be called "win probability."

> How do you encode time series or event data into singular values?

I don't know, how do you encode a sequence of 1-byte values into a 1kb string of text? You can read the alphazero paper if you want to learn how a sequence of go game states were encoded.

ZephyrBlu · on March 21, 2021

> To the extent that these events have no impact on the rules or the outcome of the game beyond the current game state, they don't influence how much equity a player has in the game

This is a really odd statement. Past events can most definitely affect events beyond the current game state.

I think the AlphaStar paper [0] has a lot of good examples of my overall points. Here are a few excerpts:

"Central to AlphaStar is a policy [formula], represented by a neural network with parameters θ that receives all observations [formula] from the start of the game as inputs, and selects actions as outputs"

"To manage the structured, combinatorial action space, the agent uses an auto-regressive policy7,10,11 and recurrent pointer network"

I.e. the policy does account for past actions, and these are not encoded in any way.

"The agent parameters were subsequently trained by a reinforcement learning algorithm that is designed to maximize the win rate (that is, compute a best response) against a mixture of opponents"

I.e. the winrate value is _not_ a "win probability", it's a maximization like I said in my original comment.

Do these hold true for discrete action games and/or perfect information games? I don't know, but I find it likely they hold for at least discrete, imperfect information games.

[0] https://www.nature.com/articles/s41586-019-1724-z.epdf?autho...

anonymoushn · on March 23, 2021

> I.e. the policy does account for past actions, and these are not encoded in any way.

Alright. They should be then.

I misunderstood your objection and I would express a part of it as "the game state is not fully encoded." As an example, my encoding of Splendor gamestates includes which card is secretly in an opponent's hand if the opponent reserved it when it was face-up, and I regard this as "part of the game state," and it seems like the fine article's encoding lacks such information.

On the other hand, this discussion has made me realize my Splendor game state encoding is also lossy compared to the information needed for actual play by experts. If an expert player has 2 red chips, then reserves a face-down rank 3 holding, then takes 2 green chips the next turn, that's very different from if they started with the green chips and began collecting red chips after seeing their secret high-point-value card. My encoding does not account for this and I am struggling to think of how to fix it.