I think the reason people feel it's plateauing is because the new improvements are less evident to the average person. When we saw GPT-4 I think we all had that "holy shit" moment. I'm talking to a computer, it understands what I'm saying, and responds eloquently. The Turing test, effectively. That's probably the most advanced benchmark humans can intuitively assess. Then there's abstract maths, which most people don't understand, or the fact that this entity that talks to me like an intelligent human being, when left to reason about something on its own devolves into hallucinations over time. All real issues, but much less tangible, since we can't relate it to behaviours we observe or recognize as meaningful in humans. We've never met a human that can write a snake game from memory in 20 seconds without errors, but can't think on its own for 5 minutes before breaking down into psychosis, which is effectively what GPT-4 was/is. After the release of GPT-4 we strayed well outside of the realm of what we can intuitively measure or reason about without the use of artificial benchmarks.