Monte-Carlo seems more akin to how humans play games.
Alpha-Beta pruning is the classical computer science algorithm, easier to understand, describe, and analyze... but very inhuman. Its an exhaustive search, like a breadth-first search.
Monte-Carlo has a lot of variations. Classically, the original MCTS algorithms for Go would play all the way to the end of a game before searching other parts of the tree. In effect, its a depth-first search, you play until you make a conclusion (potentially a "bad" conclusion). The reasoning is simple: on the average, a lot of "samples" of playthroughs will lead to a better estimate for win/loss probability. Think of it as a random-sample of the win/loss potential of a move.
EDIT: Cleaned up the above paragraph.
MCTS implemented by AlphaZero is different however. It uses a neural-net to find "interesting" positions to guide the search, and the same neural net also evaluates the position (who is winning or losing). It seems like a very good way to have a single neural network mostly perform double-duty (aside from the two output nodes). Most of the input layers / early hidden-layers can be "recycled" between the "explore" function and the "evaluation" function.
EDIT: I don't know how AlphaZero gets its "exploration" network trained up however. Its possible that the original randomized MCTS algorithm (search randomly... which btw, performs way better than you'd expect...) might be used to "bootstrap" the exploration weights.
So MCTS just naturally works with neural nets very well.
> Its an exhaustive search, like a breadth-first search.
It's actually neither of those.
An exhaustive search would definitely not be able to search as deeply as state of the art alpha-beta chess engines can. There's a lot of pruning going on.
Minimax search (with or without alpha beta pruning) is depth-first search, not breadth-first. Iterative deepening means that you start with a low depth search and progressively increase depth, using the results from the previous depth to do move ordering which increases the efficiency of pruning.
The pruning in Alpha-beta is very much a heuristic that is applied on top of the template. For example, Stockfish may recognize that some moves are "forced" and will iterate a specific variation deeper.
Nonetheless, the classical chess search algorithm is exhaustive. Breadth-first search with Alpha-Beta pruning is how it is taught at universities. Stockfish uses iterated deepening as you've said, but that's very similar to Breadth-first search.
Stockfish is also multithreaded and got a hash-table to remember previously searched positions (even positions from "another" branch of the tree).
I think my point is that Quiescence search is an explicitly programmed feature of Stockfish. Under certain conditions, Stockfish will search deeper (but not all conditions).
MCTS on the other hand is almost ENTIRELY a quiescence search. MCTS analyzes "deeper" positions, while Stockfish analyzes more exhaustively.
------
And again: classical MCTS (probably not AlphaZero) will keep going until it evaluates a position as "win", "loss", or "draw". It will NOT return until it gets that kind of evaluation.
Positions of "win" are seen as a "win" with a multi-armed bandit, and are explored further and analyzed further. Positions of "loss" or "draw" will be seen as a miss in the multi-armed bandit, but the "exploration" phase of the algorithm will randomly search a position anyway.
The multiarmed bandit heuristic is to search "winning" positions more, but to "explore" losing positions less. Ex: every time you search a winning position 10,000 times, you want to search losing positions 100 times. The precise formula involves a logarithm, division, and I think a square-root. The formula has some statistical guarantees about finding results that I never really understood, but hope to research and math-out one day.
MCTS has a more elegant statistical theory behind it. But its still very much a manually tweaked process: how much time to dedicate "exploring" bad moves, and how much time spent "exploiting" good moves (analyzing and making sure a "good" move is as good as you think it is). Alpha-Beta search doesn't have any concept of this, and requires a Quiescence search engine "tacked on" for the feature. AlphaZero was interesting in that this manual tweaking of MCTS was left to the neural network training. Its a very elegant use of neural nets to improve the MCTS template.
> Nonetheless, the classical chess search algorithm is exhaustive.
Minimax search is brute-force, exhaustive search. As soon as you add any pruning (such as alpha-beta pruning, or the myriad search tricks used in chess engines), that's no longer the case. That is unless we're quibbling on the semantics of "exhaustive".
> Breadth-first search with Alpha-Beta pruning is how it is taught at universities.
Again, no. Minimax search (regardless of any applied pruning) proceeds by visiting the possible successors of a node one at a time, immediately recursing on each of them. See wikipedia's pseudocode for it:
Alpha Beta pruning results in the same answer as classical Minimax search.
Alpha Beta pruning is an optimization: there's no need to search the whole tree to know that a branch can be pruned (but still result in the same answer as minimax).
Minimax with Alpha Beta pruning gives you the best position for a given depth. ("Best" as defined by Minimax to the same depth without pruning).
> Monte-Carlo seems more akin to how humans play games.
I don't think random choices is how humans play chess, or games in general. I would say it's pattern memory: one learns to play by building a hierarchy of patterns that lead to a prediction of outcome, thus making it possible to select outcome(s) that lead to success.
Hmmm, you're right in that humans don't play randomly. But Monte-Carlo in this instance meant "MCTS", and not "random play". I apologize for the imprecision.
MCTS is very methodological despite the name. Especially in the AlphaZero implementation, where there's no random play during inference / during games (!!). So where did the "Monte Carlo" name come from?
The "original" MCTS algorithm used random play to guestimate the strength of a position. But the "real" theory behind MCTS is the math revolving around multi-armed bandit problems. Given a large number of slot machines (in Chess or Go, the positions are conceptually seen as "slot machines" in a casino), the goal of MCTS is to find the slot-machine that gives the highest probability of winning.
In effect: classic MCTS rolls out a position all the way to the final result: win, loss, or draw. Not really in AlphaZero, but this is how MCTS was originally designed. AlphaZero shortcuts this process with a neural-network to guestimate the endgame result.
But still: the MATH, and the precise order which the search tree conducts is very heavily based in multi-armed bandit theory and random samples. Even if the algorithm in practice doesn't use randomization, all the math and understanding of the search tree is derived from randomized ideals.
I think MCTS well-represents the human thinking of the typical expert. Every play is read all the way to the end of the game instinctively. Experts study the pawn-king-rook endgame not necessarily because they expect to see pawn-king-rook endgames... but because the pawn-king-rook endgame is of huge importance to middle-game pawn development.
Learning to see how middle-game (and even early-game) moves develop into endgame is pretty much what being an expert is about. And MCTS best emulates that kind of thinking.
True, chess is very tactical. But humans are generally bad at exhaustive searches and tactics. Humans win with positional play, and with better understanding of endgame positions.
----------
So rest assured, when I say "Monte Carlo", I'm not talking about a typical monte-carlo algorithm. I'm talking about Monte Carlo Tree Search, which is a very, very different theoretical basis than Alpha-Beta pruning.
Monte-carlo algorithm don't make random choices. The "randomness" part is simply a device used to approximate expected values (integrals) in very large spaces.
true, but my point still holds: there is very little randomness in the way brain works. yes, it is present, but not as a primary device to deal with large search space.
we don't have a definitive answers on how the brain works, but I think, in this discussion, sparsity and hierarchy are those devices.
Why does this article emphasize Komodo MCTS and completely ignore LeelaZero which has had much more exciting and interesting results as far as 'alternative method' engines go? This is especially strange given that Leela operates similarly to AlphaZero which the article strongly praises.
The "article" is an ad for Komodo. That's also why it includes 'ten times faster progress than the "normal" Komodo version'.
To be fair, Komodo MCTS is an exciting and new development, and relevant because it can run on a typical desktop machine without a GPU, but you can't expect this piece to be intellectually honest or rigorous.
That's because ChessBase sells Komodo. Don't get me wrong, Komodo MCTS (from test results) does very well and it's quite an innovative approach to make work at that level.
Nitpick: Leela Zero is the go program, Leela Chess Zero is the chess program.
Curiously, there's been some discussion lately in LZ's discord chat about whether its search, or more precisely its evaluation, should be made a bit more minimaxish (i.e., alpha-beta-like). Perhaps the optimum is somewhere in between the two.
I would love to use Alpha Zero. The article makes it sound like Komodo (which uses similar techniques to Alpha Zero) could also beat Stockfish (since Alpha Zero beat Stockfish). Stockfish still beat Komodo and all other engines according to the computer chess championship which uses equal-and-limited compute requirements.
You can run LeelaZero, which is weaker than AlphaZero but neck and neck with Stockfish. It lost 50.5-49.5 in the recent TCEC championship [0], which sounds like it is what you are referring to. (It historically has used equal hardware, but that has been complicated by the emergence of GPU engines).
What I'd like is a shim to act as a UCI engine but actually relay moves to/from LeelaZero on a remote machine. Chessbase offer something like this, but as a paid service on a proprietary protocol.
Yea the TCEC is one. I also think the chess.com computer chess championship (CCC) is interesting (Stockfish is winning here as well). The CCC uses more advanced hardware. Both are live now and interesting to watch, I mentioned the link for CCC but the TCEC is here: https://tcec.chessdom.com/
A server-based championship would also be interesting but pretty degenerate. I guess that's why computers playing more complex real-time games is becoming the bleeding edge.
This criticism refers to DeepMind's late-2017 results however. In late 2018 DeepMind have published a follow-up paper[1] in Science where they claim to have beaten Stockfish 8 under non-cherrypicked conditions, specifically the conditions under which the latter won TCEC 9 in 2016.
AFAIK there are no AlphaZero results against either the current stable SF release (Stockfish 10) or the SF development version which narrowly beat Leela at TCEC 14.
I would still love to see the Stockfish team go up against the AlphaZero team, with the Stockfish team being allowed to modify their code however they need to in order to get the performance they need for the event. I think they should agree on some hardware equivalences upfront since they dont have the same requirements, and I think that the Stockfish team should have an API to play the AlphaZero algo in preparation, just the same way that AlphaZero already has unfettered access to StockFish.
I'd relish to read about whoever wins that: a fair competition.
Honestly, Leela Chess Zero was built on top of the AlphaZero algorithms, and I'd argue that LC0 is more relevant and interesting today than what happened two years ago.
Two open source implementations with active development is equal footing. The importance is the algorithm and how it improves, not on a specific implementation that has been abandoned. For those, we have TCEC and CCCC continually measuring those and a fair number of other implementations.
Stockfish and Leela, after 100 games, were virtually tied - Stockfish had one extra win. Leela has passed Houdini and Komodo at this point. At this point, it becomes the question if Stockfish can improve at the same speed as Leela.
You might be missing the point. AlphaZero is a research project, not a commercial project.
The most interesting result is AlphaZero's playstyle. It's a lot less robotic and something that humans can actually learn from compared to Stockfish's.
Its not code that's the big differentiator here. Its hardware.
Neural Networks are an embarasingly parallel problem that can be solved with accelerated matrix-multiplication hardware. The entire network can be easily represented as just matrix-multiplication problems.
Stockfish is still written in a classical fashion, and has issues scaling above and beyond 64 cores. But Neural Networks can keep getting bigger, and bigger, taking advantage of extremely parallel hardware like GPUs (10,000+ shaders per GPU) or Google's Tensor Processing Units.
In some ways, the fight will never be fair. GPGPUs and TPUs can simply throw far more math, using far less power, at the problem.
Furthermore, games of perfect information, like Chess and Go, have always been played well using neural nets.
In effect, Neural Nets are being used as a tool to allow most computation to occur on far more parallel and efficient hardware.
-----------
I think the next big stage for Chess AI / Go AIs is for someone to figure out how to take advantage of SIMD-compute, to achieve similar scalability. As it is right now, both MCTS and Alpha-Beta pruning can only really be done on a CPU, so you can't leverage the huge computational power available today.
I think Chess would be the easier game: Bitboards are represented as 64-bit integers and would easily map to just 2x32-bit registers of a GPU shader. But I get stuck whenever I think of an algorithm that would have low-divergence.
Perhaps Go would be easier due to the similarity of pieces?
I dunno, I've got way too many personal projects I wanna work on. So this isn't really a thing I can do. But its definitely an interesting research problem. My best guess is to convert the whole Chess-playing AI into a SAT-solver and then write a GPU-accelerated SAT-solver (because SAT is uniform in theory, so I would expect a GPGPU-based search of the 3-SAT space to be low-divergence)
I'm just shooting from the hip here, no serious suggestions really. But that's my quickie 2-minute thought process on this particular problem. You don't necessarily have to make the whole thing uniform, modern GPUs are either 32-warp size (NVidia) or 64-wavefront size (AMD). So you just need to figure out how to consistently get batches of 32 or 64 to execute the same code without diverging across if-statements or loops...
Heh, easier said than done.
EDIT: Perhaps organizing the code so that "All Bishop Moves" are evaluated in a batch. Lets say 1-million chess boards are to be evaluated on a GPU, how would it be done? Perhaps, first, analyze all Bishop-moves. Then analyze all Rook Moves. Then analyze all Queen moves. The goal of Stockfish was to "process the most boards" at any given time. Changing the architecture for bandwidth-oriented GPGPU compute would be very much in the same spirit as Stockfish's original goals.
Alpha-Beta pruning is the classical computer science algorithm, easier to understand, describe, and analyze... but very inhuman. Its an exhaustive search, like a breadth-first search.
Monte-Carlo has a lot of variations. Classically, the original MCTS algorithms for Go would play all the way to the end of a game before searching other parts of the tree. In effect, its a depth-first search, you play until you make a conclusion (potentially a "bad" conclusion). The reasoning is simple: on the average, a lot of "samples" of playthroughs will lead to a better estimate for win/loss probability. Think of it as a random-sample of the win/loss potential of a move.
EDIT: Cleaned up the above paragraph.
MCTS implemented by AlphaZero is different however. It uses a neural-net to find "interesting" positions to guide the search, and the same neural net also evaluates the position (who is winning or losing). It seems like a very good way to have a single neural network mostly perform double-duty (aside from the two output nodes). Most of the input layers / early hidden-layers can be "recycled" between the "explore" function and the "evaluation" function.
EDIT: I don't know how AlphaZero gets its "exploration" network trained up however. Its possible that the original randomized MCTS algorithm (search randomly... which btw, performs way better than you'd expect...) might be used to "bootstrap" the exploration weights.
So MCTS just naturally works with neural nets very well.