Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Thomas Bayes and the crisis in science (the-tls.co.uk)
105 points by agonz253 on July 1, 2018 | hide | past | favorite | 62 comments


On the one hand, I have (in the semiconductor manufacturing industry) encountered statisticians who were greatly averse to making any prior assumptions about the likelihood of something. Also, any article with the phrase "nonsense on stilts" in it is not entirely unworthy. It also does a credible job of explaining Bayes' theorem.

On the other hand, I think it somewhat exaggerates the extent to which Bayes' theorem and its associated work are rejected by statisticians, and also the extent to which a magical p > 0.05 limit was advocated. I believe it was non-statisticians who wanted easy tests that they could apply without understanding much about statistics, who were to blame in that case.


> I believe it was non-statisticians who wanted easy tests that they could apply without understanding much about statistics, who were to blame in that case.

Yes and no. I think most journal articles feature a copy-and-paste approach to statistics, reusing what other papers do, or worse, tinkering with test functions in excel. Yet, this seems to be ok with statisticians it seems.

From my personal experience applying statistics in the software world (and trying to build alerting models for monitoring applications, etc.) is, that since we have a lot of systems that continuously record date (instead of the way scientific studies usually work, collecting a set of data points, N=x), we get cross-validation of our results, which has drastically made me aware of the pitfalls of frequentist statistics.


There's probably some truth to this. But if statisticians weren't able to convince the academic world that arbitrarily setting a "statistical significance" threshold is a gross misuse of statistics, then what good are they? It would be like MDs failing to convince the rest of the medical community that bloodletting isn't a universal cure for whatever the hell ails you.


Responsibility in communication is a two-way street, and also depends on incentives and relative power.

- Are security researchers grossly incompetent for failing to convince the IT industry to adopt best practices?

- Are voters in democracies grossly incompetent for failing to police their politicians and keep them honest?

- Are media outlets grossly incompetent for not squeezing out 'fake news', whatever your definition?

- Are epidemiologists and public health officials grossly incompetent for not convincing anti-vaxxers of their foolishness?

Anytime there is a breakdown in communications, both the listener and the speaker should probably shoulder some of it.


> statisticians weren't able to convince the academic world that arbitrarily setting a "statistical significance" threshold is a gross misuse of statistics

What would the proper use of statistics be? I believe it is to produce a Bayesian posterior probability distribution, but that wouldn't be the opinion of a frequentist.


Statistics is formalized philosophy of science, so of course scientists outside the stats departments wanted "easy tests". Arguably, a lot of the new applications of computational Bayesian methods in the natural science are the same statistical witch-doctoring with a bit more rigor.

(Ok, I do computational Bayes for a living with natural scientists, so I'm inherently biased to view our work as shakier than it really is.)


'Statistics is formalized philosophy of science' -- care to expand? Or provide supplementary literature?


He’s likely referring to the fact that the rules of probabilistic reasoning can be shown to be the only rules of reasoning that yield correct inferences from data. Bayesian statistics is in that sense the “logic of science”. The first chapter of E.T. Jaynes’s book with that subtitle [0] is a good introduction to these ideas if you haven’t seen them before.

0. E.T. Jaynes (2003), “Probability Theory: the Logic of Science,” http://bayes.wustl.edu/etj/prob/book.pdf [PDF]


As a matter of fact, Jaynes is a good reference even though I didn't have him in mind at the time. I was more referring to the general impression I get reading a lot of statistics textbooks and papers, which is that they're trying to numerically formalize methodologies appropriate to experimental sciences. You end up with a fair amount of papers co-written between phil-of-science scholars and statisticians.


Aside: What do you do for a living?


LinkedIn in his HN user page.

> Variational Bayesian methods and deep learning for affective neuroscience, using probabilistic programming systems ProbTorch and Pyro.

It does sound like he's in for a rough time, epistemologically speaking.


You'd be surprised how easily the witch-doctory passes for real science ;-)! Kidding, kidding, our empirical side actually has to be at least as rigorous as the other statistical methods used in neuroscience, usually more so. That's what they keep us around for!


Twenty years ago I would have been surprised. :-)


A statistician could talk me out of this, but I've always been puzzled by the use of "prior." Nothing in Bayes's Theorem says that the prior has to be established before anything else. It makes Bayesian statistics seem like a matter of letting your expectations influence your results.

Instead, an interpretation that seems more favorable to me, is that you consider all of the information at your disposal, that can be brought to bear on a problem, and this could include known constraints on likelihoods. Bayes's Theorem becomes a tool for working through problems where a single statistic can't be readily used to analyze an entire data set, e.g., when data come from disparate sources and can't be readily combined.


Part of the issue is that you can't not assume a prior; it's unavoidable. The Bayesian POV just makes this assumption clear / explicit, while many frequentist methods (if naively applied) amount to choosing a uniform prior.

E.g., imagine you have an e-commerce site that has, historically, had a 2% conversion rate (landing page to purchase). Now you run an A/B test with two variants, a control (A) and a treatment (B). Both buckets get 10,000 landings, of which A converts 200 of them and B converts 250. How can you tell if A is better than B? Cutting to the chase, the frequentist approach (if applied naively) would be to model B as some distribution centered around 2.5%, for example N(2.5%, 0.15%) or Beta(251, 9751).

A Bayesian would say that this assumes a uniform prior - but that this is probably a bad prior because it ignores what we know about the historical conversion rate of 2.0%. Said another way, the above amounts to saying (before we run the test) that we think it's just as likely for B to have a conversion rate of 2.5% as it is to have a conversion rate of 100%. Clearly we don't actually believe this.


> Cutting to the chase, the frequentist approach (if applied naively) would be to model B as some distribution centered around 2.5%

Why would that be wrong?

> A Bayesian would say that this assumes a uniform prior - but that this is probably a bad prior because it ignores what we know about the historical conversion rate of 2.0%.

What would a Bayesian conclude instead?


> Why would that be wrong?

The issue is that modeling B with a distro centered around 2.5% ignores what we know about the historical conversion rate (2.0%) and the control bucket's conversion rate (also 2.0%). If our goal is to make the best estimate for the future that we can, we should take this data into account when evaluating B. As a thought experiment, imagine that you have A at 2.0% and B at 2.5% conversion for Week 1, with a historical conversion rate of 2.0%. Someone says they'll pay you $100 if you correctly guess what B's conversion rate will be next week, either (i) in the range 2.0% to 2.5%, or (ii) in the range 2.5% to 3.0%. I'd prefer to bet on (i) than on (ii).

> What would a Bayesian conclude instead?

One simple approach would just be to start with a more informative prior, like Beta(2+1,100-2+1) instead of Beta(1,1). This would pull bucket B's posterior distribution closer to 2.0%. Another approach is to use a hierarchical model [1], which will fit the individual buckets' priors for you.

[1] Here's something I wrote on this a couple years ago, more focused on solving multiple comparisons problems but with the same proposed solution: http://normal-extensions.com/2014/07/16/ab-testing-hierarchi...


> The issue is that modeling B with a distro centered around 2.5% ignores what we know about the historical conversion rate (2.0%) and the control bucket's conversion rate (also 2.0%).

Both the historical and the control bucket used version A of the website, and they are consistent in their 2.0% conversion rate. Version B is different, and it appears to have a different conversion rate of 2.5%. So why should it not have a future conversion rate close to 2.5%?

Let's replace the website with a 6-sided die. Historically, the probability of throwing a 3 was 1/6. Now you replace your die with a different die and throw it 10,000 times; the 3 comes up 2560 times. If I had to guess how many times the 3 comes up the next 10,000 throws, I certainly would bet that it's closer to 2560 times than to 1667 times.

> Someone says they'll pay you $100 if you correctly guess what B's conversion rate will be next week, either (i) in the range 2.0% to 2.5%, or (ii) in the range 2.5% to 3.0%.

Case A: The historical version A of the online shop had some influence on the conversion rate during the testing of version B, drawing the conversion rate of B down. This influence will fade away in the future, so B's conversion rate will be closer to [2.5%, 3.0%] than to [2.0%, 2.5%].

Case B: The historical version A of the online shop did not have any influence on the conversion rate during the testing of version B (compare the dice example above). Then both ranges are equally plausible. But "[2.0%, 2.5%] vs [2.5%, 3.0%]" is a bad dichotomy. A more relevant one would be "[1.75%, 2.25%] vs [2.25%, 2.75%]". In that case, I would bet on [2.25%, 2.75%].


Late to the party, but:

> Both the historical and the control bucket used version A of the website, and they are consistent in their 2.0% conversion rate. Version B is different, and it appears to have a different conversion rate of 2.5%. So why should it not have a future conversion rate close to 2.5%?

It's all a matter of degree. You'd model B's rate as closer to 2.5%, but probably not centered around 2.5%. As you observe more data, the prior becomes less important. E.g., with 10k samples as in the original example, if you used Beta(2+1,100-2+1) as your prior, your posterior would be Beta(252+1, 10100-2+1) as your posterior, which is centered at 2.495%. But if you only had 1000 samples (and 25 conversions), you'd get a distro centered at 2.45%. And if you only had 200 samples (and 5 conversions), you'd get a distro centered at 2.33%. Etc.

> Let's replace the website with a 6-sided die. Historically, the probability of throwing a 3 was 1/6. Now you replace your die with a different die and throw it 10,000 times; the 3 comes up 2560 times. If I had to guess how many times the 3 comes up the next 10,000 throws, I certainly would bet that it's closer to 2560 times than to 1667 times.

In the case of a die where you believe any weighting of the faces is equally likely, this would be true. So this may be an appropriate model in this case. But in the case of the website, I don't think the conversion rates are equally likely, even for a new, un-tested site. If the historical conversion rate is 2.0%, and I'm forced to bet on the most likely conversion for a new (never before seen) variant B, I'd much rather bet on a number near 2.0% than a number like 99%.

> Case B: The historical version A of the online shop did not have any influence on the conversion rate during the testing of version B (compare the dice example above). Then both ranges are equally plausible.

This is exactly what I'm claiming is not true. It's not that A influences B, it's that A tells you something about the likely range of A and B (in this specific case of an e-commerce site). (The reason I chose the ranges [2.0%, 2.5%] vs [2.5%, 3.0%] is that if you model B independently, you'd be indifferent between these ranges; but if you use A to inform a prior, you'd prefer [2.0%, 2.5%].)


I'm not a statistician but as I understand it, in Bayesian terms, when you get some new piece of information you must update your beliefs to incorporate the new information. So you take your belief prior to the new information and incorporate the new information to get your updated belief.


That's cool. It sounds like how we all learned to think since time immoral, so I can hardly dispute it.


>It makes Bayesian statistics seem like a matter of letting your expectations influence your results.

The smart approach is to have a set of a prior expectations, run through the data, and see what comes out the other side. Essentially, through a large amount of experimentations, you can narrow down what assumptions were indeed correct.


A prior could be a population level statistic which is then updated by a likelihood to reveal a posterior probability. Take for example a medical test. We know that on average that 1/10 people have some disease X. We have a test that tests for X (with some false positive rate).. Using that we can calculate the posterior probability that you actually have disease X. If you relied just on the test you don't discount the population level statistics which might be relevant. You can also use the tests FP rate in the same way as your prior... e.g. the people who test positive who don't have the disease etc...


It sounds like you're talking about just using the maximum likelihood estimator (with a special likelihood) which is similar to using a flat (improper) prior. One disadvantage of doing so is that the results aren't invariant to reparameterization of the problem. The latter problem can be fixed by using Jeffreys' priors. However, the MLE is very effective in its own right.


It's amusing that the people who are most militantly Bayesian aren't Bayesian statisticians. It's almost as if there are advantages and drawbacks to the Bayesian perspective. By the way, frequentism vs. Bayesianism has very little to do with Thomas Bayes. All statisticians accept the validity of Bayes' Theorem. Moreover, the two approaches are not mutually exclusive. It's always interesting to investigate the frequentist properties of Bayesian estimators and the implicit priors of frequentist estimators.


I don't mean to segue here, but I got caught in a logical argument that ended abruptly with no definitive answer.

I mentioned the Monty Hall problem to a friend (an engineer) and discussed the statical analysis done on this issue. (ie, statistically it's better to switch doors after the first is opened, as it improves your odds of winning slightly)

But he only answered with "nope, Bayes' Theorem says the odds are the same no matter what you do". The only other point he added was that if you do something only once, you have the same odds every single time.

I found this frustrating simplistic, because we _know_ through testing that this isn't true. That the odds are better if you switch doors.

Is this the fanaticism of Bayesians? He even called me a "frequentist" as if it were some kind of pejorative.

I've researched this issue, and even mentioned to him his logical flaw (you choose twice, not once) and still "Bayes theorum says there's no difference and you're wrong because you are a frequentist"

sigh.


Here's how I would explain the Monty Hall problem:

First, get the person to realize that switching doors will always flip your initial outcome. If you picked a losing door, switching gives you the winning one. If you picked the winning door, switching gives you a losing one. There is no winning-to-winning or losing-to-losing. If you switch, you will always get the opposite of whatever you originally picked.

Once this idea has been understood, ask them to consider the probability that your original choice was a losing door. Once they agree that it's 2/3 (2 losing doors out of 3 possible choices), you can reply "that's right, and if there's a 2/3 chance that your current choice is a losing door, then there's a 2/3 chance that you're in a situation where switching will give you the winning door. Therefore, the chance that switching will give you the winning door is 2/3!"


Bayes’ theorem says no such thing. Your friend is wrong, but please do not let this reflect badly on the Bayesian interpretation of probability in general—it has nothing to do with this.


And in fact a Bayesian approach, by being explicit about conditioning information, helps to show the “trick” behind the Monty Hall problem: when Hall opens a door, he reveals information about what’s behind the other two doors that the contestant didn’t have when he made his initial choice. The probabilities change in response to this new information.


I am not really sure what this has to do with bayesian vs. frequentists?

the bayesian theorem is a mathematical fact. It's true if we just accept some basic axioms. Bayesian/Frequentist divide is also pretty independent from the problem, since it's purely math-based? We are not estimating something...In fact, an easy solution to the monty-hall problem is to just apply the base theorem and calculate the probabilities.


>...an easy solution to the monty-hall problem is to just apply the base theorem and calculate the probabilities.

It seems simple, but the claim (not mine) was that the theorum states that since you are only choosing "once" you can't use an estimation, as that is based on collected data... which somehow doesn't exist or apply to you.

The reason is that somehow there are no odds/percentages/probabilities if there is only one instance of an event, or at least they don't change. (ie, 2 door, always 1 in 2 odds, period)

Normally I wouldn't detract from a thread like this, but participation seems low, so I can take the down votes in hopes of getting a response that helps me with this conundrum.


Bayesian statistics is a way of updating your beliefs about the value of some unknown number (or value) after doing an experiment. You have to be uncertain about some variable, and then you work backwards to figure out its value.

In the case of the Monty Hall problem, we have a complete understanding of what's going on, so there are no beliefs to update. There are no hidden rules or magic numbers we haven't been told about and we need to work out. You literally can't use Bayesian statistics if you understand a system completely.

Also, Bayesian statistics is a tool. It's not a law of nature. It can get the wrong answers. There is literally no guarantee that you will get the right answer when using it. Garbage In Garbage Out. It's just that there are cases where, empirically, it can be useful.


> we have a complete understanding of what's going on, so there are no beliefs to update.

This is not true, unless by 'complete understanding' you imply that everything must be deterministic and we understand every single variable.

However, probabilities can be interpreted as statements of lack of knowledge at certain points of a process, even if the entire process itself is understood completely.

For example, if I have a process where randomly half the time I get a 55% heads biased coin, and half the time I get a fair coin, then I flip it 10 times, I can absolutely apply Bayes' Theorem here to figure out probability I had the fair or the biased coin after seeing the result of the flips.

This is a system that I understand completely, and part of that system is a single unknown bit, which we're trying to put bounds on.

A correct Bayesian treatment of the Monty Hall problem is given in a cousin comment: https://news.ycombinator.com/item?id=17438740


No that's Bayes' theorem, not Bayesian inference. You haven't made any Bayesian assumptions.


Thanks for this explanation, it certainly expels some of my confusion.

It seems then based on your reply that my friend is possibly wrong for invoking Bayesian theory regarding the Monty Hall problem for the exact reason he claimed it was the only answer? (note, my friend is a brilliant person and dedicated engineer who understands math far above my level. I only program with statistics and the PhD's tell me what to do.)

In other words, his argument is incorrect in his assessment because we actually _do_ have previous knowledge? (He pointed out that Bayes theorum was used for analysing insurance premiums for new kinds of insurance because there was no previous data to work with, which fits with your statement.)

This seems like too easy an answer for him to have missed, which makes me suspect of my understanding of your reply.

(yes, my initial question was serious, I appreciate you replying)


Not only do we have previous knowledge, we know basically everything about what's going on. You only use Bayesian inference if there is something you don't know which can't be computed from the things you do know. Then, after doing an experiment, you work backwards to figure out the value of the things you didn't know.

Don't confuse Bayesian inference with Bayes' theorem, which is a theorem that doesn't have any philosophy attached to it. Bayesian inference makes use of this theorem on top of some philosophical assumptions, which may be questionable depending on how you use them.

Bayesian inference also requires you to give your "prior" beliefs, which are the beliefs you have before you do the experiment. If these beliefs are strongly biased away from the right answer, you will get Garbage Out.


Ok, that is really neat. I need to read more about this. I just did some searching on bayesian interference solution for the monty hall problem. And I am guessing they are applying the math whilst pretending to not have foreknowledge of the outcome?

https://sc5.io/posts/how-to-solve-the-monty-hall-problem-usi...

I certainly don't expect you to research this or explain every detail for me, I may have to take a class on this subject after I retire to satisfy my need to grasp this. Thank you for time, this was enlightening.


just to chime in. The monty-hall problem is a pure math (probability-theory) and not statistics-excersize, since we know all the probabilities beforehand and don't estimate.

It's very important to distinguish between bayesian inference and the bayesian theorem. The bayesian theorem can be proven and therefore must be true (if we accept the axioms). Bayesian inference has some real philosophical problems, since we often have no way to choose a justified prior. Frequentists approaches may also use the Bayesian theorem.

I always like the blog-posts of the angry statistician and he also has one about the monty-hall problem: http://angrystatistician.blogspot.com/2012/06/bayes-solution...

The important difference is that one is the probability of P(x == car), which is 1/3, and one is P(x == car | y == goat). Just apply the bayes theorem and calculate it yourself!


The easiest way to convince your friend of his error is to imagine the odds when there are n doors, and Monty opens n-2 of them, the ones which don't contain the prize and aren't your first pick.


How could Bayesian statistics be applied in this case? I’m wondering if the situation is just too “simple” to make it applicable.

Before Monty opens a door, so the prior probability, is 1/3 for each door.

After he opens a door, that door will have zero probability, as we know he opens a door without a car, then how to update the probabilities afterwards?

Seems the simple way to look at it is to not partition by door, but by chosen vs not chosen. Chosen is 1/3 and not chosen is 2/3 before and after Monty opens a door, so perhaps there is no “Bayesian information” revealed by opening the door anyway.


> Seems the simple way to look at it is to not partition by door, but by chosen vs not chosen.

Right. That might be the easiest way for this problem.

More straightforwardly, without that shortcut:

- Call the doors a, b, and c. Assume, without loss of generality, that we choose door a initially. Let the random variable X ∈ {a, b, c} be the door with the car.

- Our prior probability is uniformly distributed: Pr(X = a) = Pr(X = b) = Pr(X = c) = 1/3.

- The data Y that we collect is our observation of which door gets opened by the host. The likelihood function Pr(Y = y | X = x) is the probability of the observation being y (i.e., that Y = y), given that the underlying state is x (i.e., that X = x). The only non-zero likelihoods are Pr(Y = b | X = a) = Pr(Y = c | X = a) = 1/2 and Pr(Y = c | X = b) = Pr(Y = b | X = c) = 1.

- Bayes' theorem, Pr(X = x | Y = y) = Pr(Y = y | X = x)·Pr(X = x)/Pr(Y = y), gives the answer, the posterior probability, which should be seen as a function of x. The denominator Pr(Y = y) = ∑ Pr(Y = y | X = x), sum over x ∈ {a, b, c}, is a normalising factor that makes the posterior probability distribution sum to 1. It is also the probability we, at the start of the game, assign to Y = y. In our problem, Pr(Y = a) = 0 and Pr(Y = b) = Pr(Y = c) = 1/2.

How about you put in the numbers and see if it comes out right or if I have made a mistake? :-)

Edit: Sorry, I just realised that we could have made it simpler by assuming that the host opens, say, door b.


Brilliant. Yes, the wrong solutions are highlighted by the door opening event, and thus your odds of success increase when you eliminate bad outcomes.


Judea Pearl uses the Monty Hall problem in his latest book, and when he draws out the causal network, the issue becomes obvious. The game-show host chooses which door to open based on two factors: he has to choose a door you didn't, and he has to choose a door with a goat.

Hence, the door he chooses to open is probabilistically informative about your initial choice. Treating the door he opened as information on which to condition does change the posterior.


He’s wrong, you can show switching doors improves your performance with Bayes’ theorem.


Edit: I removed my entire comment, it seems that there is a lot more online now about Bayesian theory than when I had looked this up, or my google-fu was off when I was researching this before.

Thanks for giving me new search terms to consider. I have more arguments and examples to read through now. :P


“Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, ‘Do you want to pick door No. 2?’ Is it to your advantage to switch your choice?”

Let A be the event that the car is behind door 1, B that the car is behind door 2, and C that the car is behind door 3. Let E be the event that the host opens door 3. We assume that the car is initially equally likely to be behind each door and that the host opens a door with a goat at random, never opening the door we picked.

Since A, B, and C are exhaustive and mutually exhaustive propositions, we can calculate the marginal probability of E by using the law of total probability:

P(E) = P(EA) + P(EB) + P(EC).

Bayesians like to define joint probability from conditional probability instead of the reverse; that is, define P(AB) as P(A | B) P(B) instead of P(A | B) as P(AB) / P(B).

So P(EA) = P(E | A) P(A). P(E | A) is 1/2 because we picked door 1, the car is behind door 1, and the host chooses at random a door that has a goat, of which there are two: 2 and 3. P(A) is 1/3. Therefore P(EA) is 1/2 × 1/3 = 1/6.

Similarly, P(EB) = P(E | B) P(B). P(E | B) is 1 because we picked door 1 so the host will not open door 1 and we assume the car is behind door 2 so the host will not open door 2, leaving only door 3 to be opened. P(B) is 1/3. Therefore P(EB) is 1 × 1/3 = 1/3.

P(EC) = P(E | C) P(C). P(E | C) is 0 because the host will never open the door the car is behind. P(C) is 1/3. Therefore P(EC) is 0 × 1/3 = 0.

So P(E) = 1/6 + 1/3 + 0 = 1/2. We know that the host opened door 3 (this is E), so the car cannot be behind door 3. How likely is it to be behind door 1? By Bayes’ theorem,

P(A | E) = P(E | A) P(A) / P(E).

We said earlier that P(E | A) is 1/2, P(A) is 1/3, and P(E) is 1/2. So P(A | E) = (1/2 × 1/3) / (1/2) = 1/3.

Given E, the car must be behind door 1 or door 2 since the host opened door 3. Therefore the sum of P(A | E) and P(B | E) must be 1. P(A | E) is 1/3, so P(B | E) is 2/3. The car is more likely to be behind door 2 than door 1. We initially picked door 1, so, if we want the car, we should switch.


Thanks, this looks quite a bit more succinct than other examples of the math. I think I will have to make an app that renders out results using this. :)


> "frequentist" as if it were some kind of pejorative.

Ah, so he was right about that, at least! :-)


Does this comment deserve a downvote? I think he's asking a genuine question.


I think the that the virulent all-or-nothing Bayesian vs. frequentist debates are really more from earlier decades. These days most people are open to using techniques from both camps and are more interested in solving problems than being philosophically "pure".


That seems like a vast generalization across the board.


> Moreover, the two approaches are not mutually exclusive.

I would say that they are based on two different view of what probability is, and that subscribing to one means that you cannot accept the other.


Not gonna lie- I don't think I understand the difference. Every explanation seems to bounce off and over my thick skull.


You can abuse Bayesian methods just as easily you can hack a p-value. Maybe more easily, since fewer people would be aware of the issues.

What's needed is a shift in researcher attitudes and incentives, to emphasize development of reliable knowledge instead of publication record. Just changing the rules of the game slightly will only lead to people adjusting their game slightly.

http://www.stat.columbia.edu/~gelman/research/unpublished/p_...


Yes but with traditional frequentist approaches most scientists don’t understand what’s going on under the hood of the statistical tools they’re using. There are dozens and dozens of named tests like “Fisher’s exact test” and “Wilcoxon signed-rank test” and most scientists just sort of follow received wisdom about which test to use in which situation.

Bayesian methods force you to actually think about how your data and model parameters are distributed, and explicitly specify a model.


>Bayesian methods force you to actually think about how your data and model parameters are distributed, and explicitly specify a model.

Yep. I know some very senior scientists who get a lot of mileage out of finding places where "model-free" methods are implicitly assuming a bloody stupid model, and then attacking them.


For a more detailed take advocating the particular solution "report likelihoods, not posteriors or p-values", see "Likelihoods, p-values, and the replication crisis": https://arbital.com/p/likelihoods_not_pvalues/?l=4xx


>We are living in new Bayesian age. Applications of Bayesian probability are taking over our lives. Doctors, lawyers, engineers and financiers use computerized Bayesian networks to aid their decision-making. Psychologists and neuroscientists explore the Bayesian workings of our brains. Statisticians increasingly rely on Bayesian logic. Even our email spam filters work on Bayesian principles.

While that's true, strictly speaking, it's akin to writing that, "We are living in a new logical age" when Boolean algebra was first finding wide application in engineering and the natural sciences. "Bayesian methods" just mean using statistical modeling techniques that conform to Bayes' rule as their normative guide. In "machine learning" or "frequentist" terms, it just means that good approximate-Bayesian reasoning minimizes the KL divergence between the true posterior and the approximate model (whether variational or by sampling or by training a neural network, whatever), as opposed to minimizing the classification hinge-loss or the mean squared error (though some of those losses have formally equivalent Bayesian priors).


Strictly speaking, health care and genocide are the same thing, you just minimize different metrics.


Is there a metholodgy for bayesian analysis that avoids the choice of a specific prior but instead provides conclusions in the form of boundary/regions in "prior-space" and their effect on belief? For example, it would be incredibly useful if the output of research allowed a reader to gauge support of the conclusion in a minimally subjective way by explaining what effect choices in prior have on results. I'm not a statistician so I'm assuming this is a well understood thing, but would be curious to know if and how it is practiced.


Generally speaking, a good Bayesian analysis includes what's known as a "sensitivity analysis" which seeks to measure how sensitive the results are to the particular choice of prior. Additionally, if strong prior assumptions are not available, an "uninformative" prior is used. In such cases, the results tend to be pretty close to those from frequentist methods, except the frequentist methods lack the Bayesian probabilistic interpretation.


The final result of a Bayesian analysis is the posterior probability, which is the product of the prior probability (which can incorporate previous knowledge or convictions) and the likelihood (which is determined by the data you just collected). You can leave out the prior, and just report the likelihood. Another comment mentions this: https://news.ycombinator.com/item?id=17438643




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: