Great vs Terrible Uses for A/B Testing

Tags:

Last Updated:

January 26, 2024

by: Dan Layfield

Back when I was running the growth team at Codecademy in 2018 we read the book “Hacking Growth” and fell in love with the concept of A/B testing.

I felt like we weren’t a “real” growth team unless we were shipping a lot of A/B tests and learning as fast as we could.

About 6 years later, I have now launched hundreds of A/B tests in my career. Some were incredibly effective, some were giant wastes of time, and most didn’t matter in the end.

Knowing what I know now, I would have run a lot of these tests differently and that’s what we’re doing to look at this week.

What is A/B Testing?

There is a lot written about what A/B testing is, so I won’t go super deeply into that topic.

If you want to go deep, you can read this blog.

A/B testing has become a blanket term for “experimentation” in a product.

In summary, it is the concept of splitting your traffic in half and measuring the difference between your:

Control or “A” - the thing you have in the product right now
Variant or “B” - the change you are making

You essentially ship the variant and then run it in parallel with the control. You wait a few weeks and then use a bunch of semi-fancy statistics to determine how confident you can be in your results.

There are multiple different styles of testing such as multi-variant tests and switch-back tests, however for 95% of companies, you will just use simple A/B tests and that’s fine.

Why Do You Use A/B Tests?

When debating how to release something, it is first helpful to take a step back and think about what we’re trying to do.

A/B testing sits within the overall world of product development, which has 4 distinct phases.

Identify - What is the objective or goal you’re trying to hit? So move a KPI, make more revenue, improve NPS etc.
Decide - Determine what you should change or build in the product to make that happen.
Ship it - Releasing the feature or change that you think will do that.
Measure - Determine if the change that you made accomplished what you want.

As the diagram suggests, after step 4, you start again with the lessons that you have learned in the first cycle.

The company that will win any space is the one that can make this loop turn as fast and accurately as possible.

The tradeoff of A/B tests is that they allow you to measure more accurately at the cost of making the whole project take longer.

In other words, it makes steps 3 and 4 of this loop most expensive and therefore slows down your overall speed.

What it Takes to A/B Test Effectively

Setting up the infrastructure and tools to run a testing program takes material effort to do properly. You'll need to have:

An A/B testing tool - These come in a variety of flavors. The enterprise-grade ones with all the bells and whistles likely cost 60-90k per year. There are free options, however, you’ll need an engineer’s help to set them up, which also is expensive.
Correctly Configured Analytics - You need to have instrumentation on the site set up to track your KPIs effectively. If you can’t trust your analytics, you can’t run an A/B test
Sufficient Knowledge of Statistics - You need to be able to at least calculate the length that a test needs to run (use a tool like this) and make a judgment about which side won.
The Ability to Explain Results - The most underrated, however, critical component is the ability to explain to your stakeholders what happened in a test, and what that means and get them to agree with your recommendation.

When A/B Testing is a Great Idea

In my experience, the A/B tests that were the best uses of resources were in one or more of these buckets:

1. There was a material amount of risk

The clearest examples of these are things like price changes, especially for subscription products where the impact will take a while to uncover.

A good rule of thumb here is that if you made this change and then it went bad, are you in trouble with leadership? If so, you should probably test it.

A price raise would fit into this category as you want to determine what this will do to the business before you make it permanent and these decisions are hard.

2. There is an important change that is tough to detect without testing

This is the case when you’re working on a critical metric that moves around a lot and it would be pretty tough to isolate the impact.

Staying in the example of a price raise, Raising prices will impact your revenue/ users, your conversion rates, your product usage, and your retention numbers. It will be tough to just eyeball your dashboards to understand this impact.

3. Testing is really easy and can be done rapidly

These are CRO-style tests, email subject headers, ad headlines, etc. Any test that you can get out the door quickly and with minimal overhead.

These tests can be shipped very fast, have clear KPIs, and don’t block other team’s roadmaps.

When A/B Testing is a Bad Idea

There are multiple tests that I have shipped in the past that I shouldn’t have. I should have instead just looked at our dashboards to see if I could get a rough sense of the impact.

1. You are implementing a clear best practice

If you are implementing a feature that your project should obviously have, this is a bad use of testing.

I once shipped a test to include our pricing page in the header of the site at Codecademy. It increased our trial starts by 7%, which was nice to know, however we should have had this the whole time and just shipped it faster.

2. You are fixing something that is broken or unclear

If you are fixing something that is broken or obviously confusing, just ship it.

If your onboarding flow leads a user to a dead end, just fix it.
If you learn that your copy or CTAs are unclear, just fix them.
If you users hit an error in certain parts of the application, just fix it.

These are not worth testing when you are fixing clear issues with the user experience.

3. You are adding new value that users have asked for

These are features that you have clear evidence that users want, e.g Spotify adding podcasts or an accounting tool adding a receipt upload feature.

These help users and it's much better to try to understand the impact over time by monitoring how the feature gets used.

So What Do You Do With This Information

My current opinion is that you should do as little A/B testing as you can get away with, however, you should be able to use it effectively when needed.

As mentioned above, when you ship A/B tests you are trading speed for accuracy. For earlier-stage products, speed is likely to be more important.

This doesn’t mean that you shouldn’t be trying to determine the impact of your products, you should, however, you can probably get close by looking at your metrics every day and checking to see if the changes you are making are improving them.

Clearly defined metrics, as we talked about in a previous post, can get you 80% of the way there to understand the impact of your work.

When I was at Uber Eats, we bought a company called Cornershop for $1.4B, which was a grocery delivery company that had grown to dominate LATAM and part of Europe.

Similar to Uber, they had a great product but word on the street was that they had never run an A/B test. This blew a lot of minds at Uber, which tests literally everything.

I write this to say that there are multiple ways to build a successful company, so long as you are moving quickly and trying to measure results, you are on the right path.

‍