However, I really hate the "Golem of Prague" introduction. It presents an oversimplified caricature of modern frequentist methods, and is therefore rather misleading about the benefits of Bayesian modeling. Moreover, most practicing statisticians don't really view these points of view as incompatible. Compare to the treatment in Gelman et al.'s Bayesian Data Analysis. There are p-values all over the place.
Most importantly, this critique fails on basic philosophical grounds. Suppose you give me a statistical problem, and I produce a Bayesian solution that, upon further examination with simulations, gives the wrong answer 90% of time on identical problems. If you think there's something wrong with that, then congratulations, you're a "frequentist," or at least believe there's some important insight about statistics that's not captured by doing everything in a rote Bayesian way. (And if you don't think there's something wrong with that, I'd love to hear why.)
Also, this isn't a purely academic thought experiment. There are real examples of Bayesian estimators, for concrete and practical problems such as clustering, that give the wrong estimates for parameters with high probability (even as the sample size grows arbitrarily large).
Gill's book, Bayesian Methods, is even more dismissive, and even hostile towards Frequentist methods. Whereas I've never seen a frequentist book dismissive of Bayes methods. (Counterexamples welcome!)
It boils down to whether you give precedence to the likelihood principle or the strong repeated sampling principle (Bayes prefers the likelihood principle and Frequentist prefers repeated sampling). See Cox and Hinkley's Theoretical Statistics for a full discussion, but basically the likelihood principle states that all conclusions should be based exclusively on the likelihood function; in layman's terms, on the data themselves. This specifically omits what a frequentist would call important contextual metadata, like whether the sample size is random, why the sample size is what it is, etc.
The strong repeated sampling principle states that the goodness of a statistical procedure should be evaluated based on performance under hypothetical repetitions. Bayesians often dismiss this as: "what are these hypothetical repetitions? Why should I care?"
Well, it depends. If you're predicting the results of an election, it's a special 1 time event. It isn't obvious what a repetition would mean. If you're analyzing an A/B test it's easy to imagine running another test, some other team running the same test, etc. Frequentist statistics values consistency here, more so than Bayesian methods do.
That's not to come out in support of one vs the other. You need to understand the strengths and drawbacks of each and decide situationally which to use. (Disclaimer: I consider myself a Frequentist but sometimes use Bayesian methods.)
> Whereas I've never seen a frequentist book dismissive of Bayes methods
Nearly every Frequentist book I have mentioning Bayesian method attempts to write them off pretty quickly as "subjective" (Wasserman, comes immediately to mind but there are others), which is falsely implying that some how Frequentist methods are some how more "objective" (ignoring the parts of your modeling that are subject does not somehow make you more object). The very phrase of the largely frequentist method "Empirical Bayes" is a great example of this. It's an ad hoc method that somehow implies that Bayes is not Empirical (Gelman et al specifically call this out).
Until very recently Frequentist methods have near universally been the entrenched orthodoxy in most fields. Most Bayesians have spend a fair bit of their life having their methods rejected by people who don't really understand the foundation of their testing tools, but more so think their testing tools come from divine inspiration and ought not to be questioned. Bayesian statistics generally does not rely on any ad hoc testing mechanism, and can all be derived pretty easily from first principles. It's funny you mentioned A/B tests as a good frequentist example, when most marketers absolutely prefer their results interpreted as the "probability that A > B", which is the more Bayesian interpretation. Likewise the extension for A/B to multi-armed bandit trivially falls out of the Bayesian approach to the problem.
Your "likelihood" principle discussion is also a bit confusing here for me. In my experience Fisherian schools tend to be the highest champions of likelihood methods. Bayesians wouldn't need tools like Stan and PyMC if they were exclusively about likelihood since all likelihood methods can be performed strictly with derivatives.
This sounds to me very much like a political debate between people arguing for the best method, rather than focusing on the results that you can get with either method.
As long as this debate is still fuelled by emotional and political discourse, nothing useful will come out of it.
What is really needed is an assessment which method is best suited for which cases.
The practitioner wants to know “which approach should I use”, not “which camp is the person I’m listening to in?”
"Whereas I've never seen a frequentist book dismissive of Bayes methods. (Counterexamples welcome!)"
Indeed! There's a lot of Bayesian propaganda floating around these days. While I enjoy it, I would also love to see some frequentist propaganda (ideally with substantive educational content...).
All of Statistics by Larry Wasserman is a great introductory book from the frequentist tradition that includes some sections on Bayesian methods. It's definitely not frequentist propaganda - more like a sober look at the pros and cons of the Bayesian point of view.
My first year of grad school I ordered a textbook but what I got was actually All of Statistics with the wrong cover bound on.
I skimmed through a couple chapters before returning it for a refund. I sometimes regret not keeping it as a curio, but I was a poor grad student at the time and it was an expensive book.
> Indeed! There's a lot of Bayesian propaganda floating around these days. While I enjoy it, I would also love to see some frequentist propaganda
I think that frequentist statistics doesn’t need marketing. It’s the default way to do statistics for everyone and, frankly, Bayesian software is still quite far away from frequentist software in terms of speed and ease of use. Speed will be fixed by Moore’s law and better software and easy of use will also be fixed by better software at some point. McElreath and Gelman and many others do a great job in getting more people into Bayesian statistics which will likely result in better software in the long run
In my opinion books for practitioners is not the place for such discussions. Deborah's book might be poorly written, but if we want to go where the foundations of disagreements are we have to reach philosophy. Bayessian advocates are also often philosophers, like i.e. Jacob Feldman.
From theoretical statisticians Larry Wasserman is more on the frequentist side. See for example his response on Deborah's blog [1]. But he doesn't advocate for it in his books. So yeah, besides Deborah, I am not aware of any other frequentist "propagandist".
> Gill's book, Bayesian Methods, is even more dismissive, and even hostile towards Frequentist methods.
I'm skeptical of this because Frequentist (likelihood) methods are a special case of Bayesian methods, with flat/uniform priors for parameters (and the "flatness" of a parameter is dependent on your chosen parameterization anyway; it's not a fixed fact about the model). So it's reasonably easy to figure out when frequentist methods will be effective enough (based on Bayesian principles), and when they won't.
>Whereas I've never seen a frequentist book dismissive of Bayes methods.
I think it more has to do with the long history of anti-Bayesianism championed by Fischer. He was a powerhouse who did a lot to undermine its use. The Theory that Would Not Die went into some of these details.
I think I'm misremembering. I read through some of the introductory material in the second edition of his book and found it less critical than I recalled.
But in some places, it definitely comes across as hostile (e.g. footnote 107).
Also, the sentence "Bayesian probability is a very general approach to probability, and it includes as a special case another important approach, the frequentist approach" is pretty funny. I know the exact technical result he's referring to, but it's clearly wrong to gloss it like that.
He does mention consistency once, page 221, but (unconvincingly) handwaves away concerns about it. (Large N regimes exist that aren't N=infinity...)
Honestly I think it is a little hostile. Not towards frequentist directly, but towards the mis-use of frequentist methods in science. He works in ecology and I think he comes across a bunch of crap all the time. He talks at length about the statistical crisis in science and I can't really blame him.
But I could see how someone might take this as an attack on the methods themselves.
I agree. The golem is presented as an analogue to any statistical inference: powerful but ultimately dumb, in the sense that it won't think for you. That's in my opinion the major theme of the book---you have to think and not rely on algorithms/tools/machines...or golems to do that for you.
I think the classes opt for starting with a simple mental model students can adopt, which is gradually replaced with a more robust and nuanced mental model.
In this case he wasn't talking just about frequentist methods tho, it's also talking about doing statistics without first doing science (and formulating a causal model).
I would be wary of jumping to conclusions from that introduction alone if you haven't seen the rest of the course or the book.
> There are real examples of Bayesian estimators, for concrete and practical problems such as clustering, that give the wrong estimates for parameters with high probability (even as the sample size grows arbitrarily large).
Could you give some specific examples, and/or references? This is new to me, and I would like to read deeper into it.
Thanks for the detail! I took a look at the first paper, the result was new to me.
In the vogue days of reversible jump MCMC I played with mixture estimation of the number of components under a basic prior (an approach which gives decent results in Figs 1 and 3), but I never used a Dirichlet process prior for this problem. This paper points out that even this simple approach is problematic because it’s only consistent if the true distribution is such a mixture, and in my case it definitely was not.
Anyway, one takeaway, esp. from sec 1.2.1, is that the Dirichlet process prior is not suitable for estimating #components in most cases; it favors small clusters. And indeed, the concept of estimating #components is tricky to begin with, as noted above.
Just because you can compute the posterior, doesn’t mean it’s saying what you think it is about the underlying true distribution!
Agreed. There are situations where frequentist guarantees is what you want rather than optimal estimates of parameters.
Frequentists consider the model parameters fixed and allow the data to vary. Bayesian consider the data fixed and allow parameters to vary.
For example many software engineering, drug testing, and large scale pipelines want frequentist guarantees, because your system will have varying input data and you want theoretical bounds on what inferences you can make.
Suppose you give me a particle physics problem, and I produce a quantum mechanics solution that, upon further examination, is wrong.
If you think there's something wrong with that, then congratulations, you're a "quantum negationist," or at least believe there's some important insight about physics that's not captured by doing everything in a rote quantum way. (The important insight being that GIGO.)
The issue isn't that Bayesian methods used incorrectly can have bad frequentist properties. It's that, according to many flavors of Bayesianism, having bad frequentist properties isn't a valid line of critique.
You may not believe in the particular stances I'm calling out, but if so, we don't disagree.
I mean "with simulations using a probability distribution [for the true parameter] different from the prior used in the Bayesian analysis." (The issue of model error is a separate question.)
Yes, in this case would should conclude there is something wrong with the Bayesian way. If you hand me a statistical method to e.g. estimate some parameter that frequently returns answers that are far from the truth, that is a problem. One cannot assume the prior exactly describes reality (or there would be no point in doing inference, because the prior already gives you the truth).
At least a Bayesian posterior tries to describe reality. In a way which is consistent with the prior and the data. But again, GIGO. Including prior information into the inferential process will be beneficial if it's correct but detrimental if it isn't. Hardly surprising.
On the other hand, Frequentist methods do not claim anything concrete about reality. Only about long-run frequencies in hypothetical replications.
You may think that makes them better, it's your choice.
Sure, I agree bad priors will give inaccurate inferences. My point is simply that to make a statement like, "an inaccurate prior generates many inaccurate inferences, and therefore it is garbage," one has to adopt a frequentist criterion for the quality of an estimator (like "gets good results most of the time").
Uh, dude. If you read the book, you'd see the Golem of Prague isn't a parable about frequentist models specifically, it's about all models, period. He calls his Bayesian models golems all the time.
However, I really hate the "Golem of Prague" introduction. It presents an oversimplified caricature of modern frequentist methods, and is therefore rather misleading about the benefits of Bayesian modeling. Moreover, most practicing statisticians don't really view these points of view as incompatible. Compare to the treatment in Gelman et al.'s Bayesian Data Analysis. There are p-values all over the place.
Most importantly, this critique fails on basic philosophical grounds. Suppose you give me a statistical problem, and I produce a Bayesian solution that, upon further examination with simulations, gives the wrong answer 90% of time on identical problems. If you think there's something wrong with that, then congratulations, you're a "frequentist," or at least believe there's some important insight about statistics that's not captured by doing everything in a rote Bayesian way. (And if you don't think there's something wrong with that, I'd love to hear why.)
Also, this isn't a purely academic thought experiment. There are real examples of Bayesian estimators, for concrete and practical problems such as clustering, that give the wrong estimates for parameters with high probability (even as the sample size grows arbitrarily large).