A/B and Qualitative User Testing

paraschopra · on Sept 9, 2009

Hey guys, a shameless plug but my startup is creating a product that tries to take A/B testing to the next level. It combines A/B testing with multiple goals, WYSIWYG editor, visitor segmentation, powerful analytics and funnel analysis to provide a very powerful environment for testing and targeting. Plus we provide developer API for testing, analytics and targeting.

Here is the link to the product tour: http://www.wingify.com/learn-more.php

Buzz me at paras@wingify.com if anyone is interested in trying out the product.

danielhodgins · on Sept 9, 2009

Went to your site, and couldn't find any info on pricing. As a startup guy that's pinching pennies I could probably handle a $19 or $29 monthly subscription for a testing/metrics service that's better than Google Website Optimizer, but anything more than that would cut into my ramen budget. All the features you have listed here are great, but I care more about when you will be launching in beta, and how much the service will cost.

paraschopra · on Sept 9, 2009

I agree that the pricing is not visible on the website and the reason for that is that we are still in the price discovery mode. You gave a perspective on pricing which is very useful in this regard.

Another reason of not having pricing on the site right now is that we have two possible paths to go forward: pure product selling OR a providing complete solution (product + testing strategy + implementation). The latter one leads to variable pricing which is very client dependent.

Anyway, let me know if you want to try out the solution. I'd be happy to provide you with an account.

DenisM · on Sept 9, 2009

Methinks you are hiding form customers. Not displaying any clue on pricing is the best way to not get any feedback. Unless you are overwhelmed with prospect and have a good reason to hide from new ones, I suggest some soul searching in this regard. If nothing pops, refer to "customer development" by Steven Blank.

erlanger · on Sept 9, 2009

Maybe I'm ignorant of the proper way to do A/B testing, but I shy away from it because of the prospect of presenting users with inconsistent/volatile interfaces.

btilly · on Sept 9, 2009

There are two basic responses to the inconsistent/volatile issue that you bring up.

The first is mitigation of risk. We generally A/B test versions that are similar enough that we'd be comfortable rolling out the winner to the whole site without warning. As a result they tend to be fairly small changes, which goes a long way to solving the inconsistent/volatile issue. Furthermore you associate users with an A/B test slice so while different users get different experiences, a single user's experience should be consistent across their experience with the site. As a result of these two factors the impact of A/B testing is very small.

The second is magnitude of reward. Most websites that do not currently do A/B testing can find site improvements which will give business improvements of 20% or more. That's a pretty clear financial incentive to put up with unhappy users. But in fact many of those improvements made your users happier. So the "construction in progress" is far, far outweighed by the rewards.

This is why companies like Microsoft, Google and Amazon use and strongly advocate A/B testing. Google has created free tools to help people A/B test websites. I know that Microsoft has an internal version of that, and suspect that they intend to open access at some point as well.

If you want to learn more about how to build an A/B testing framework you could do worse than to go through my OSCON tutorial I did last year. My slides are at http://elem.com/~btilly/effective-ab-testing/. Or you can go to http://www.google.co/websiteoptimizer and use theirs. Theirs has some drawbacks over building your own. Their approach can't A/B test email programs, and is somewhat tricky to do with dynamic content. (Not impossible, but you have to rewrite pages in JavaScript.) When I last looked at it they also don't allow you to track multiple statistics in an A/B test, and can't let you analyze combinations of tests after the fact. OTOH theirs is already built and is easy to use. :-)

patio11 · on Sept 9, 2009

This is an implementation issue.

Assuming you are talking about presenting users inconsistent interfaces in terms of User Bob seeing both the A and B, then this is not too hard to solve. If your issue is with evolving your interface rapidly enough that User Bob would see A1, B2, A3, A4, and A5 in the period of two weeks, then that's a bit outside the scope of this discussion, but it is a pace-of-change issue rather than strictly speaking an A/B testing issue.

Anyhow, back to Bob. When we first see Bob on our site, we assign him a random identifier which is presumed to be unique. Should Bob log in or otherwise demonstrate his identity to us, we associate the random identifier durably with the identity. Any other anonymous user who signs in as Bob will henceforth have their random identifier overwritten with the canonical Bob identifier.

Observe that this means anyone who we know is Bob will always have the same constant random identifier, from here on out.

OK, now for each A/B test we do, we take a unique identifier of the test (the database primary key, or the name if we can guarantee names are unique, or what have you) and concatenate it with the user's identity, then hit that string with a good hash function. MD5 works fine in practice: it doesn't have to be cryptographically secure, it just has to provide good entropy.

We now have a big number which is entropic but which will always be the same for a given user/test pair. Splendid: take the modulus of the number of choices, and give that user the Nth option.

Thus, as long as Bob logs in prior to seeing the A/B test, Bob will always see the same alternative for that test, until you cancel the test.

Now, if users do not log in, it gets a wee bit trickier. You can cookie them the first time you see them, store identifiers in a session, use their IP address (or hash of it) as an identifier, or use other tricks, but it is fundamentally difficult to follow humans across computers without them taking affirmative steps to declare their identity to you.

If this explanation leaves you scratching your head, and you'd prefer seeing it in commented code, I suggest looking at the source code and usage docs of A/Bingo. ( http://abingo.org ) Disclaimer: I wrote it.

btilly · on Sept 9, 2009

The hash trick is clever. But what happens when you start with 4 versions, then find that one version is worse than the others? Do you have to start the test over? Is there some kind of rebucketing algorithm you use?

patio11 · on Sept 9, 2009

I'd just start the test over. If you want to do rebucketing, see the following Ruby-esque psuedocode:

http://pastie.org/610630