Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A/b experiments are definitely a gold standard as they provide true causality measurement (if implemented correctly). However, they are often expensive to run: need to implement the feature in question (which is less than 50% going to work) and then collect data for 1-4 weeks before being able to make the decision. As a result only a small number of business decisions today rely on a/b tests. Observational causal inference can help bring causality into many of the remaining decisions, which need to be made quicker or cheaper.


The “gold standard” has failure modes that seem to be ignored.

E.g.: making UI elements jump around unpredictably after a page load may increase the number of ad clicks simply because users can’t reliably click on what they actually wanted.

I see A/B testing turning into a religion where it can’t be argued with. “The number went up! It must be good!”


That’s generally because the metrics you are looking at do not represent what users care about. It’s different than the testing methodology, often overlooked, and a lot more important.

I’ve argued that A/B testing training should focus on that skill a lot more than Welch’s theory, but I had to record my own classes for that to happen.


But those metrics are hard to move, so you target secondary metrics.

The problem with that strategy becomes obvious when you spell out the consequences: measurably improving the product is hard, so you measure something else and hope you get product improvements.


There can be a real ethical dilemma when applying A/B testing in medical setting. Placing someone with an incurable disease in a control group is condemning them to death while in treatment group they might have a chance. On the other hand, without a proper A/B testing methodology the drug efficacy cannot be established. So far no perfect solution to the dilemma has been found.


> in a control group

The control group gets the current standard treatment, not nothing (in case that was a source of confusion). Plus they typically don't have to pay for it which is a benefit for them.

Large trials today will typically conduct interim analyses and will have pre-defined guidelines for when to stop the trial because the new treatment is either clearly providing a benefit or is clearly futile.

Here is an example of such a study: https://www.ahajournals.org/doi/10.1161/CIRCHEARTFAILURE.111...


Most therapeutic trials are nowadays "Intent to treat". So subject would receive either standardized tx or experimental tx in th e randomization. Many of them also have crossovers such that when measurable (as defined by the protocol) benefit is seen, standard tx based subjects can be moved over to the experimental arm


It's not really an ethical dilemma until you know it works, and then usually if the evidence is strong enough they'll cut the trial early.


All the alternative methods require the same sacrifice. More importantly, most suggested treatments fail to cure deadly conditions or have major side effects or risks that are just as unethical to thrust upon people untested.

If you look at it properly, i.e. evaluate what should be your actions before the test (Do nothing, Impose untested treatment, Test with proper control to learn what to do with the majority of the population), the answer is rarely ambiguous.

There is a debate to be had on how much pre-clinical work to be done before clinical testing, but those are increasingly automated, cheap, and fast, so we often reach the point where a double-blind test is the next logical step.

The argument you present is based on either an unwarranted confidence in treatments, or information that wasn’t available when the decision had to be made.


You can end the trial early when it’s clear the treatment is working. This just happened last week with Ozempic for diabetes caused kidney disease. https://www.wxyz.com/news/health/ask-dr-nandi/novo-nordisk-e...


Causal inference is useful, but it's neither quicker nor cheaper.


Agree that it is hard today. A person you might know is trying to prove that is doesn’t have to be: https://www.motifanalytics.com/blog/bringing-more-causality-... .

We’d love to chat more with you on the topic - feel free to hit Sean or me on LinkedIn.


I am a big fan of what Sean and you are trying to do–I wrote up a chapter about it this weekend, actually. I’m worried that you both have worked for companies where a lot of work has been done to identify relevant dimensions (metrics and categories) and automate causality (or rather: estimating factors on a pre-existing causal graph because that’s the slight of hands the word “causality” does) made sense once you’ve reached that level of maturity.

But to reach that point, before having relevant dimensions, there has to be a lot of work, generally motivated by disappointing experiments. “Why didn’t that work?” is often answered by “Because our goal is too remote from our actions—here’s a better proxy” or “Because this change only makes sense to 8% of our users, here’s how we can split them.”

I’m worried that too many people will think the tool itself is enough and not a complement to the maturity in understanding a company’s user. This ‘solutionism’ is widespread among Data tools: https://www.linkedin.com/posts/bertilhatt_the-potential-gap-...


Thank you for clarifying.

Reading some of your posts I think we agree more than disagree. A big difference from most new analytics tools you see today is that we don't want to provide a magic "solution" (which is bound to over-promise and under-deliver) but rather a generic tool to quickly define and try out different business categories on the data.

Followed you on LinkedIn for more in-depth takes.


It is likely to be cheaper and quicker to run a counterfactual test in the computer than in real life.

The question is how reliable it is.


> As a result only a small number of business decisions today rely on a/b tests.

The default for all code changes at Netflix is they’re A/B tested.


an expensive test is better than an expensive mistake :) within the scale of hundreds of decisions made with inherent bias of the product/biz/ops teams that direction misalignment can be catastrophic




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: