Back when I was running the growth team at Codecademy in 2018 we read the book “Hacking Growth” and fell in love with the concept of A/B testing.
I felt like we weren’t a “real” growth team unless we were shipping a lot of A/B tests and learning as fast as we could.
About 6 years later, I have now launched hundreds of A/B tests in my career. Some were incredibly effective, some were giant wastes of time, and most didn’t matter in the end.
Knowing what I know now, I would have run a lot of these tests differently and that’s what we’re doing to look at this week.
There is a lot written about what A/B testing is, so I won’t go super deeply into that topic.
If you want to go deep, you can read this blog.
A/B testing has become a blanket term for “experimentation” in a product.
In summary, it is the concept of splitting your traffic in half and measuring the difference between your:
You essentially ship the variant and then run it in parallel with the control. You wait a few weeks and then use a bunch of semi-fancy statistics to determine how confident you can be in your results.
There are multiple different styles of testing such as multi-variant tests and switch-back tests, however for 95% of companies, you will just use simple A/B tests and that’s fine.
When debating how to release something, it is first helpful to take a step back and think about what we’re trying to do.
A/B testing sits within the overall world of product development, which has 4 distinct phases.
As the diagram suggests, after step 4, you start again with the lessons that you have learned in the first cycle.
The company that will win any space is the one that can make this loop turn as fast and accurately as possible.
The tradeoff of A/B tests is that they allow you to measure more accurately at the cost of making the whole project take longer.
In other words, it makes steps 3 and 4 of this loop most expensive and therefore slows down your overall speed.
Setting up the infrastructure and tools to run a testing program takes material effort to do properly. You'll need to have:
In my experience, the A/B tests that were the best uses of resources were in one or more of these buckets:
The clearest examples of these are things like price changes, especially for subscription products where the impact will take a while to uncover.
A good rule of thumb here is that if you made this change and then it went bad, are you in trouble with leadership? If so, you should probably test it.
A price raise would fit into this category as you want to determine what this will do to the business before you make it permanent and these decisions are hard.
This is the case when you’re working on a critical metric that moves around a lot and it would be pretty tough to isolate the impact.
Staying in the example of a price raise, Raising prices will impact your revenue/ users, your conversion rates, your product usage, and your retention numbers. It will be tough to just eyeball your dashboards to understand this impact.
These are CRO-style tests, email subject headers, ad headlines, etc. Any test that you can get out the door quickly and with minimal overhead.
These tests can be shipped very fast, have clear KPIs, and don’t block other team’s roadmaps.
There are multiple tests that I have shipped in the past that I shouldn’t have. I should have instead just looked at our dashboards to see if I could get a rough sense of the impact.
If you are implementing a feature that your project should obviously have, this is a bad use of testing.
I once shipped a test to include our pricing page in the header of the site at Codecademy. It increased our trial starts by 7%, which was nice to know, however we should have had this the whole time and just shipped it faster.
If you are fixing something that is broken or obviously confusing, just ship it.
These are not worth testing when you are fixing clear issues with the user experience.
These are features that you have clear evidence that users want, e.g Spotify adding podcasts or an accounting tool adding a receipt upload feature.
These help users and it's much better to try to understand the impact over time by monitoring how the feature gets used.
My current opinion is that you should do as little A/B testing as you can get away with, however, you should be able to use it effectively when needed.
As mentioned above, when you ship A/B tests you are trading speed for accuracy. For earlier-stage products, speed is likely to be more important.
This doesn’t mean that you shouldn’t be trying to determine the impact of your products, you should, however, you can probably get close by looking at your metrics every day and checking to see if the changes you are making are improving them.
Clearly defined metrics, as we talked about in a previous post, can get you 80% of the way there to understand the impact of your work.
When I was at Uber Eats, we bought a company called Cornershop for $1.4B, which was a grocery delivery company that had grown to dominate LATAM and part of Europe.
Similar to Uber, they had a great product but word on the street was that they had never run an A/B test. This blew a lot of minds at Uber, which tests literally everything.
I write this to say that there are multiple ways to build a successful company, so long as you are moving quickly and trying to measure results, you are on the right path.