Yes, I stopped reading as soon as I found they used gradient descent. Our brains...

Yes, I stopped reading as soon as I found they used gradient descent. Our brains does not uses gradient descent to learn. If you use any sufficient complex model, and fit MNIST by defining a loss and minimizing it directly, it will very likely work. Even simple linear model of logistic regression on 784 pixels gives good accuracy of around 90%.