> [...] However, sharing the AlphaZero algorithm code, network weights, or generated representation data would be technically infeasible at present.
Very interesting paper overall. However, the excuse that code sharing is "technically infeasible" is wearing thin nearly 5 years after the initial AlphaZero paper was released.
My assumption is that the code is not separate from other deepmind code. It would be very difficult for them to share the AlphaZero code without also sharing everything else deepmind is working on.
The representations are probably not manifest in a way that would be intelligible if shared.
I don't have an explanation for why they wouldn't share the weights.
In some frames of the Deepmind documentary film on AlphaGo, we can see code for loading SSTs (a common key-value data format at Google) from GFS (the Google file system).
It is possible that the entire codebase depends on Google-only infrastructure.
That part is true, but things like that are usually not too bad by themselves. For example, you can use open source tensorflow to access files on Google's internal filesystem with tf.io.gfile.
It's possible other infra is somewhat hairy to decouple, for example the code they use to allocate and use GPU resources is internal.
(I work on ML at Google and we use some of Deepmind's stuff)
> Many Human Concepts Can Be Found in the AlphaZero Network.
> We demonstrate that the AlphaZero network’s learned representation of the chess board can be used to reconstruct, at least in part, many human chess concepts. We adopt the approach of using concept activation vectors (6) by training sparse linear probes for a wide range of concepts, ranging from components of the evaluation function of Stockfish (9), a state-of-the-art chess engine, to concepts that describe specific board patterns.
> A Detailed Picture of Knowledge Acquisition during Training.
> We use a simple concept probing methodology to measure the emergence of relevant information over the course of training and at every layer in the network. This allows us to produce what we refer to as what–when–where plots, which detail what concept is learned, when in training time it is learned, and where in the network it is computed. What–when–where plots are plots of concept regression accuracy across training time and network depth. We provide a detailed analysis for the special case of concepts related to material evaluation, which are central to chess play.
> Comparison with Historical Human Play.
> We compare the evolution of AlphaZero play and human play by comparing AlphaZero training with human history and across multiple training runs, respectively. Our analysis shows that despite some similarities, AlphaZero does not precisely recapitulate human history. Not only does the machine initially try different openings from humans, it plays a greater diversity of moves as well. We also present a qualitative assessment of differences in play style over the course of training.
I think this is great work. Interpretability is the worst problem in deep learning, as the lack of insight into what the model has learned prevents it from being useful for serious decision making.
It's not just a practical problem; it's one of the most important philosophical problems in the area, too.
Something like GPT-3 can do multi-digit arithmetic much better than chance, giving results for values it was certainly never trained on. Similarly, transfer learning, where you start training a model on some input less related to the task, and then switch to inputs closer to your task at the end, can substantially reduce total training time. The task can be radically different; to use GPT-3 as an example again, compared to starting with a completely randomized model, it reduces training by a factor of about 10x to go from PCM audio samples encoded as text patterns, or abstract art bitmaps encoded as text patterns, to English text. GPT-3 is learning something about arithmetic. It's learning something that is common to music, abstract art, and English text. It might be as simple as basic patterns from geometry and arithmetic (that's my guess). But no one could even begin to point you in the direction of what that structure it is teasing out really is.
If I needed large numbers added together, I would trust a human who can explain their general addition algorithm to me, and I wouldn't trust an AI that can spit out some usually correct answers on small problems.
The explicability of our decision making is like 95% of our progress. It is why we can identify biases evolution builds into us but we need to fight against. Or try and justify something beyond mere intuition. This is like asking how much math are humans using anyway.
Humans as a group has proven to be able to build everything we have today. No AI has proven to do anything similar, there just isn't much data there to make us confident in their abilities thus far.
Conversely though, how many people are killed in the world every single day simply because of human error?
When (in the US) a 16 year old gets their license, we don't ask them to provide a formal proof of their driving technique that shows it's impossible for them to ever get in a crash. We say 'you've had the training, you've demonstrated that you can safely drive, be careful out there'.
We have a good understanding of this risk and have collectively chosen to accept it (while many countries have chosen not to accept it in the case of 16-year-olds.)
You can question a human in an exam and ask to "show your work". If you can say solve an equation but can't explain why you did this or that transformation you'll be rightly failed. With current NNs you get an answer and that's it, there's no introspection.
I skimmed the article so sorry in advance if I missed it, but to me one fairly trivial way to gauge whether AlphaZero has human-like conceptual understanding of chess would be to throw a few games of Fischer random at it.
I remember with Deepminds breakout AI one very easy way to see the difference to human play was to change the shape of the paddle. Even very slight changes completely threw the AI off, so it was obvious it hadn't understood the 'breakout ontology' in a human way.
I'd expect the same from chess. Humans who understand chess at a high level well obviously play worse in non-standard variants but the familiar concepts are still in play. If an AI has a human-like grasp of high level concepts it ought to be pretty robust to some changes to the game rules like changing the dimensionality of the board.
Regarding the breakout AI, I think you could somewhat easily train an AI that was robust to such changes, for example by randomizing the paddle shape in training. But if you train an AI on a more specific version I wouldn't expect it to magically learn the more general problem.
In fact I'd argue it's unfair to expect an AI to automatically generalize without attempting to train it for that.
The only reason humans can quickly learn many games is that they've been exposed to a variety of tasks throughout their lives, as well as the fact that we have innate biases that we might use when designing games by humans for humans.
I think many chess players will agree that latest chess engines (Stockfish NNUE/Leela) are playing better conceptually, so it's less useful to use older ones (SF8/A0) to study learned concepts. Still cool work tho.
You'll find that the abstract makes the explicit point that Alpha0 lends itself to more interesting findings as it had no exposure to how humans think about chess.
Haven't read the paper yet but given relevance to what I'm doing atm it's high on my list.
I think using pre-NNUE Stockfish is partly because the classic Stockfish evaluation function has a lot of human knowledge explicitly built in in already interpretable ways, making it a good contrast for comparison.
The goal is to discover and analyze the concepts used by an expert AI player, so it isn't vital that the player be the strongest so long as it is expert.
AlphaZero is also more interesting when you want to know how a general game-playing AI trained from scratch approaches the game.
Here is a thought experiment for beating AlphaZero. Randomly select 10K children at a very young age (say 3), have them play chess against AlphaZero but simply have them move the exact move suggested by AlphaZero (i.e. basically this is AlphaZero playing itself). Play 10 games per day for 10 years.
The hypothesis is some children will deeply embed the algorithms into their own playing style - leveraging the subconscious to the greatest degree possible. Basically, we are training the human mind in the same way that we train AI. Would it work? Probably not, but our current approach (studying openings, etc.) is obviously not working so it makes sense to try something new.
"Probably not" indeed. Human brains need to be stimulated in order for them to build their neural net. Blindly following instruction is not stimulating.
>but our current approach (studying openings, etc.) is obviously not working
That's part of the current approach, not the full approach. Kids that show promise in Chess, already train with computers today (in addition to personalized coaching, and strategy study). I have no doubt that the current generation of chess players is best of all time, and that the next generation will be even better. Even with that, I don't think any human will be able to train enough to beat even Stockfish, much less Alpha Zero - just as no human will ever train enough to beat computers at arithmetic.
>> just as no human will ever train enough to beat computers at arithmetic.
Think of the amount of complex computation must be happening for an elite gymnast for example. Sure, they aren't solving the equations consciously but at some level computation is happening - or even something as simple as making sense of all of the individual photons arriving at the eye. The subconscious is what I am suggesting we try to exploit. Even then, it seems likely that a "special" brain would be needed - hence why large numbers would be required in order to find it.
>Think of the amount of complex computation must be happening for an elite gymnast for example.
By that logic, my cat should be trainable to a Chess Grandmaster level because she performs complex computation as she navigates my backyard.
>The subconscious is what I am suggesting we try to exploit.
Is there any evidence that a subconscious is exploitable it that way ... at all. Because learning doesn't work that way - especially for higher-order knowledge like Chess.
To elaborate, in the Top Chess Engine Championship (which is also used as a benchmark in the AlphaZero vs Stockfish comparison) Stockfish has won all seasons since May 2020[1,2]. A common runner up for those seasons is LCZero[3], which is an engine that is derived from/a reimplementation of AlphaZero. Stockfish is also ahead of LCZero in Fischer random chess (as hosted at TCEC) winning 4 out of the 5 FRC tournaments.
As for how close Stockfish 8 is to a current version of Stockfish, Stockfish 15 has ~400 more ELO than Stockfish 8 as measured in their own self-tests[4].
However, when making these comparisons between AlphaZero and Stockfish it might be tempting to frame it as machine-learning-engine vs classical-engine, which is not true. Stockfish incorporates a neural network when evaluating chess positions[5].
But that's just the latest Stockfish, after LeelaZero implemented AlphaZero strategy and beat Stockfish, and then Stockfish had to add neural networks to its code to stay competitive.
My 10yo daughter plays a chess app mostly with suggestions turned on. It has substantially improved her game when we play together. Way more effective for her than my coaching, although I’m a total amateur. I think this would work without a doubt.
I think that would be transferring one particular network in a fuzzy way to a set of children, but it would not be transferring any of the training tools and feedback models that can improve the network.
The biggest failing would be, of course, the children would not know how to play against moves that are not optimal.
This is the most important part of chess knowledge: given some suboptimal play, prove it is suboptimal by defeating the player who made the suboptimal move.
Otherwise, as some chess schools do, they would be repeating famous openings by rote, without understanding the meaning behind each move in the opening.
This ability would matter little when playing against Alpha Zero, but it would make all the difference when playing against humans.
I find the question interesting and thought-provoking, so very much in the spirit of HN. But I find it distressing that some were quickly to downvote it (without even a discussion?). AFAIK, downvoting on HN is not like Unlike on other social networks. You should downvote only if the comment is irrelevant, nonsensical or violates the community guidelines. The parent comment is in my opinion none of these.
AI doesn't even attempt that. All our neural networks do is spit out the final result of whatever it's doing. If we want the "reasoning" for it we have to build different methods of doing that. It's an active field of research but much slower going than improving our AI's directly.
I always wondered if a chess engine would learn better/faster if the opening positions and piece movement rules were randomized. Has anyone tried this?
I don't think they mention anything about speeding up training. The original training time for AlphaZero was only a few hours anyway so I don't think that was ever a major constraint. I would imagine that each neural network performed best on the variant on which it was trained.
Maybe they meant the piece type is randomized, because each piece has a different movement rule. This would result in things like maybe 5 pawns and 3 rooks instead of the normal numbers.
> [...] However, sharing the AlphaZero algorithm code, network weights, or generated representation data would be technically infeasible at present.
Very interesting paper overall. However, the excuse that code sharing is "technically infeasible" is wearing thin nearly 5 years after the initial AlphaZero paper was released.