Performing arithmetic per se is not impressive. A calculator can do it and the r...

Performing arithmetic per se is not impressive. A calculator can do it and the rules of arithmetic are not so complex that they can't be hand-coded, as they are, routinely. The extraordinary claim in the GPT-3 paper is that a language model is capable of performing arithmetic operations, rather than simply memorising their results [1]. Language models compute the probabilities of a token to follow from a sequence of tokens and in particular have no known ability to perform any arithmetic, so if GPT-3, which is a language model, were capable of doing something that it is not designed to do, then that would be very interesting indeed. Unfortunately, such an extraordinary claim is backed up with very poor evidence, and so amounts to nothing more than invoking magick.

__________

[1] From the paper linked above:

In addition, inspection of incorrect answers reveals that the model often makes mistakes such as not carrying a “1”, suggesting it is actually attempting to perform the relevant computation rather than memorizing a table.

Now that I re-read this I'm struck by the extent to which the authors are willing to pull their results this way and that to force their preferred interpretation on them. Their model answers two-digit addition problems correctly? It's learned addition! Their model is making mistakes? It's because it's actually trying to compute the answer and failing! The much simpler explanation, that their model has memorised solutions to a few problems but there are many more it hasn't even seen, seems to be assigned a very, very low prior. Who cares about such a boring interpration? Language models are few shot learners! (once trained with a few billion examples that is).