Yes, it is much stronger from personal experience on real world queries. A lot less hallucinations, more ability to answer nontrivial questions, a lot more coverage of the tail. Which is not surprising for a much larger model, but unlikely to make much of a difference in largely superficial evals. Source: I personally use both, as well as Anthropic models multiple times daily, and use them in batch use cases as well.
1.5-pay is cheaper than GPT-4. GPT-4-Turbo costs 10/30, while 1.5-pay costs 7/21.