Aye. One my standard coding tests for LLM is a spherical geometry problem, a nea...

Aye.

One my standard coding tests for LLM is a spherical geometry problem, a near-triangle with all three corners being 90 degrees.

Until GPT-5, no model got it right, they only operated in the space of a euclidian projection; perhaps notably, while GPT-5 did get it right, it did so by writing and running a python script that imported a suitable library, not with its own world model.