Split Testing (also known as A/B testing or bucket testing) is an experiment where two or more variants are shown to audience at random and after that statistical analysis is used to determine which variation performs better for a given conversion goal.
A/A testing is the tactic of using A/B testing to test two identical variants against each other. Typically, this is done to check that the tool being used to run the experiment is statistically fair.
THE MOST COMMON MISTAKES IN SPLIT TESTING:
• Before-After Tests Error. It’s tempting to run a before and after test, even when you’ve been warned not to. With a before and after test, you measure conversions on your site for a period of time, make a change, and then measure conversions for another period of time. Instead of simultaneously testing two or more versions, you test different versions for different periods of time.
• Bouncing Error. This one is about changing experiment settings in the middle of a test. When you launch an experiment, you need to commit to it fully. Do not change the experiment settings, the test goals, the design of the variation or of the control (the null hypothesis) mid-experiment. And don’t change traffic allocations to variations.
• Copycat Error. Another mistake you can make is trusting what you read online and blindly implementing someone else’s test results on your case. There’s no guarantee you’ll get the same results on your product.
• Segmenting Error. Whenever you do any sort of analysis you need to understand and consider segments of the target audience. E.g the most common segments referring to your traffic are: returning visitors, new visitors, visitors divided by countries, visitors divided by traffic sources, different days, different devices. The important thing to know about segments is that you cannot compare visitors from two different segments.
• Interference Error. Running multiple A/B tests at the same time usually spoils the results. As well as using too many variations do. It takes time to perform an A/B test, so testers sometimes consider running multiple tests at once to get results faster. The reality is that this will likely cause the tests to interfere with each other and skew the results. Having 2-3 variations is usually a good choice. If you have more ideas than that, try grouping them.
• Statistical Significance Error. It usually occurs when too small (non statistically significant) sample is taken (e.g. when you end your tests too soon). When a finding is significant, it simply means you can feel confident that’s it real, not that you just got lucky (or unlucky) in choosing the sample. In statistical hypothesis testing the Statistical Significance Error leads to a Type I Error (also known as a "false positive" finding or conclusion) and Type II Error (also known as a "false negative" finding or conclusion). To evade the issue use the sample size calculator I coded below.
* Base Conversion - the current conversion rate for the page you’re testing.
** Stat Power - satistical power (1−β) is a percent of the time the minimum effect size will be detected, assuming it exists. The industry standard is 80% but recommend keeping it 90%.
*** Significance - significance level (α) is a percent of the time a difference will be detected, assuming one does NOT exist. The industry standard is 5%.
**** MDE - the minimum change in conversion rate you would like to be able to detect. Originally it's relative but you can enter either relative or absolute.
Subscribe for free with your email to receive lean hints and marketing insights!OK, SUBSCRIBE ME