The problem is that when you run a/b tests, the changed case is often worse. If ...

mygo · on March 4, 2020

No need to a/b test when there are tens of thousands of universities in the world all doing slightly different things. The cream rises and then is copied.

For example, FSU recently went up in rankings from a ~#50 public university to a #18 public university year over year over the past decade. Whatever changes they put into place are now being studied by other universities and if there’s anything novel that they’re doing, it will likely become more widespread as other universities seeking higher rankings implement new things. Or, it could be possible that the university studied the success of other universities and implemented what worked best and stopped what wasn’t. You run an a/b test when there aren’t already tens of thousands of other colleges and universities like yourself out there that you can study and learn from.

Of course, rankings DNE educational quality, but it’s one example. Another example, that may have a better correlation with educational quality (although part of the equation is the constitution of the cohort) may be bar exam pass rates for law school grads listed by law school.

dr_dshiv · on March 4, 2020

> No need to a/b test when there are tens of thousands of universities in the world all doing slightly different things. The cream rises and then is copied.

Why would it be any different in webtech or advertising? Why wouldn't everyone just copy the best vs gathering empirical data?

mygo · on March 5, 2020

Are universities not gathering empirical data when they study the results of others’ efforts at scale? Study being the key word in that scenario, it’s more than just copying what they see, but understanding why it’s effective, and if it would be for them.

And it may be quicker and more reliable with less risk than A/B testing in their setting. The most highly trafficked web tech companies can gather statistically significant feedback data about a change in moments. A/B testing a curriculum or educational practice could take a semester or more, and then the risks are higher — it would be a two-sided hypothesis test where the B group could not only do better, but could also do much worse, and it would reflect poorly on the institution if so. People are paying tens of thousands of dollars per seat per year for the best education they can get. Seeing how many items are in the shopping cart right on the “checkout” button doesn’t really reflect poorly on Amazon, but it could help Amazon increase conversion rates by .03%, which could mean millions of dollars at their scale, and they could also complete the test fairly quickly given their volume (in a day or so?) at a 99.7% confidence interval.

With that being said, I’m sure that smaller scale or faster turnaround time A/B tests are being ran at Uni’s.

nostrademons · on March 4, 2020

That's true only if the system in question has already had a lot of optimization. By the time I got to Google about 90% of experiments failed to improve metrics. When they started A/B testing nearly every experiment yielded large improvements.

I would bet that education is much more like Google c. 2000 rather than Google c. 2010. A general rule of thumb is that in the absence of extensive training and repeated failures, human intuition is terrible, and that any system based on people's opinions without hard data has a lot of room for optimization.

dr_dshiv · on March 4, 2020

So, how do you know it is worse? Isn't that the whole point of A/B testing?