The discourse around synthetic data is like the discourse around trading strategies — almost anyone who really understands the current state of the art is massively incentivised not to explain it to you. This makes for piss-poor public epistemics.
Nah, you don't need to know the details to evaluate something. You need the output and the null hypothesis.
If a trading firm claims they have a wildly successful new strategy, for example, then first I want to see evidence they're not lying - they are actually making money when other people are not. Then I want to see evidence they're not frauds - it's easy to make money if you're insider trading. Then I want to see evidence that it's not just luck - can they repeat it on command? Then I might start believing they have something.
With LLMs, we have a bit of real technology, a lot of hype, a bunch of mediocre products, and people who insist if you just knew more of the secret details they can't explain, you'd see why it's about to be great.
Call it Habiñero's Razor, but for hype the most cynical explanation is most likely correct -- it's bullshit. If you get offended and DARVO when people call your product a "stochastic parrot", then I'm going to assume the description is accurate.