What led you to beleive that mathematics is a good tool for evaluating an LLM? It is a thing they currently dont do well, since it is wildly out of domain of their training corpus - down the very way we structure information for an LLM to ingest. If we start doing the same for humans, most humans are in deep trouble.
Well I am studying mathematics, and I use the LLM to help me learn.
They aren't terrible, and they have all of arXiv to train on. Terrence Tao is doing some cool stuff with it - the idea will be an LLM to generate Lean proofs.
And I can assure you when I start to talk about these topics with the average human person that doesn't know the material, they just laugh at me. Even my wife who has a PhD in physics.
Here's some cool math I learned from a regular book, not an LLM: