Skip to main content

Sample Ratio Mismatch: Why One in Ten A/B Tests Is Lying to You

A balance scale slightly tilted to one side under a spotlight

A few years back I was consulting for a retail brand running a product page test in Adobe Target. The variant had bolder CTAs and tighter copy. After two weeks, the numbers looked... fine. Flat. The control won by a hair and the team was ready to call it and move on.

Something felt off. The control had 52,000 sessions. The variant had 46,000. We'd set it to a 50/50 split. That 6,000-session gap shouldn't exist in a balanced allocation.

We ran a chi-squared test. p-value: 0.0001. The test was broken.

That was a sample ratio mismatch, and it had silently invalidated two weeks of data.

What SRM Actually Is

Sample ratio mismatch (SRM) happens when the observed visitor counts across variants don't match the ratio you configured. Set a 50/50 split and get 53/47 on 5,000 sessions? Might be noise. Get 53/47 on 100,000 sessions? Almost certainly not.

Detection is a chi-squared goodness-of-fit test comparing observed counts against expected. Microsoft's ExP team uses a threshold of p < 0.0005 for their internal platform. Most practitioners use 0.01. Either way, if you hit it, the test data is compromised and you should not act on the results.

Here's the uncomfortable part: this is not rare. Microsoft's ExP team found SRM in roughly 6% of their internal A/B tests. LinkedIn reported closer to 10% in certain test cohorts. A company running 100 tests a year has somewhere between 6 and 10 of them quietly producing invalid results. And most teams never check.

Where SRM Hides in Enterprise Setups

The Microsoft Research team mapped SRM to four stages in the experiment pipeline. Knowing which stage you're in narrows the diagnosis fast.

Assignment

The most fundamental failure point. Your randomization is splitting users into buckets, but the split isn't landing correctly. Common causes: user ID inconsistency (mixing logged-in vs. anonymous IDs mid-session), carryover from a previous test that used the same buckets, or uneven ramp-up where someone turned the variant on for a slice of traffic first and the logs got mixed.

In Adobe Target specifically, I've seen this happen when the mbox fires inconsistently because of async loading. The user gets assigned but the assignment doesn't log before the page unloads. That missing log shows up as fewer users in the variant, not fewer page views overall.

Execution

Redirect tests are the worst offenders here. When your variant is a full-page redirect, some users get counted at assignment but drop off before the redirect completes. Bot detection can also cause this asymmetry: if bots hit your control disproportionately, your control sample inflates.

The MSN image carousel test at Microsoft is a concrete example. A test that looked like a negative result turned out to have SRM because users engaged enough to trigger bot-filtering were clustered in one variant. Once the SRM was accounted for, the conclusion flipped to positive. Two completely different business decisions depending on whether you caught it.

Log Processing

This one is insidious because the experiment itself is fine but your analysis is wrong. A bad join between your assignment table and your conversion table creates a phantom mismatch. Maybe your analytics event fires on 95% of sessions but your experiment assignment logs 100% of them.

If you're running Adobe Target with Adobe Analytics via the A4T integration, watch for this. A4T stitching relies on a supplemental hit, and if that hit doesn't fire consistently, users drop from Analytics reporting but not from Target's built-in report. You get two different session counts and neither is obviously wrong until you compare them directly.

Analysis

The sneakiest kind. Your test is fine, your pipeline is fine, but someone applies a post-hoc filter: "let's look at mobile only" or "let's exclude users who bounced in under 5 seconds." If that filter applies differently across variants, you've introduced the bias yourself. This is especially common when you segment by a metric that the treatment itself can influence.

How to Detect It

The mechanics are straightforward. Chi-squared test, two groups, comparing observed sizes against expected. In Python: scipy.stats.chisquare. In R: chisq.test. In a spreadsheet: CHISQ.TEST.

Platforms like Statsig and Eppo flag SRM automatically before you see any lift metrics. If you're on a platform that doesn't check by default (and several enterprise tools still don't), build your own check. It's ten lines of code and it should run before any result gets surfaced to stakeholders.

One practical habit: check SRM within 48 hours of launch, not just at the end of the test. If there's a redirect issue or a broken firing condition, you can catch it early and restart before you've burned two weeks of traffic.

What to Do When You Find It

Stop. Do not declare a winner. Do not try to segment your way to an answer by filtering to a cleaner-looking date range or device type. The data is compromised in ways you can't fully see, and any slicing you do will mix biased and unbiased observations in unknowable proportions.

Investigate in this order:

  • Compare your assignment logs to the configured split at the bucketing step, not just in your analytics tool
  • Check for any redirects in the variant that don't exist in control
  • Look for bot filtering rules that apply asymmetrically across variants
  • Verify your analytics firing condition is identical in both variants
  • Check for mid-test changes: audience enablement, segment rollouts, traffic spikes from a campaign

Once you've found and fixed the root cause, restart the test with a clean date range. Don't try to salvage the old data.

Make It a Pre-Readout Habit

The experimentation programs I've seen produce the most trustworthy results do three things before calling a winner: check statistical power before launch, check SRM within 48 hours of going live, and hold results until both pass.

No ML model required. Just ten minutes and a chi-squared test. Given that 6-10% of enterprise tests have this problem, that's about the best ROI you can get on ten minutes.

Comments

Popular posts from this blog

AngularJs call one method of controller in another controller .

I have seen many question about calling one method of one controller in another controller or extending scope of one controller in another controller.so here are the ways. if you want to call one controller into another or extending scope of controllers there are four methods available $rootScope.$emit() and $rootScope.$broadcast() If Second controller is child ,you can use Parent child communication . Use Services Kind of hack - with the help of angular.element() 1. $rootScope.$emit() and $rootScope.$broadcast() Controller and its scope can get destroyed, but the $rootScope remains across the application, that's why we are taking $rootScope because $rootScope is parent of all scopes . If you are performing communication from parent to child and even child wants to communicate with its siblings, you can use $broadcast If you are performing communication from child to parent ,no siblings invovled then you can use $rootScope.$emit HTML <body ng-app = ...

Closures in javascript and how do they work ?

JavaScript Closures for Dummies  Closures Are Not Magic This page explains closures so that a programmer can understand them — using working JavaScript code. It is not for gurus or functional programmers. Closures are  not hard  to understand once the core concept is grokked. However, they are impossible to understand by reading any academic papers or academically oriented information about them! This article is intended for programmers with some programming experience in a mainstream language, and who can read the following JavaScript function: function sayHello ( name ) { var text = 'Hello ' + name ; var sayAlert = function () { alert ( text ); } sayAlert (); } An Example of a Closure Two one sentence summaries: a closure is the local variables for a function — kept alive  after  the function has returned, or a closure is a stack-frame which is  not deallocated  when the function returns (as if a 'stack-fr...

Working with $scope.$emit , $scope.$broadcast and $scope.$on

First of all, parent-child scope relation does matter. You have two possibilities to emit some event: $broadcast  -- dispatches the event downwards to all child scopes, $emit  -- dispatches the event upwards through the scope hierarchy. If scope of  firstCtrl  is parent of the  secondCtrl  scope, your code should work by replacing  $emit  by  $broadcast  in  firstCtrl : function firstCtrl ( $scope ) { $scope . $broadcast ( 'someEvent' , [ 1 , 2 , 3 ]); } function secondCtrl ( $scope ) { $scope . $on ( 'someEvent' , function ( event , mass ) { console . log ( mass ); }); } In case there is no parent-child relation between your scopes you can inject  $rootScope  into the controller and broadcast the event to all child scopes (i.e. also  secondCtrl ). function firstCtrl ( $rootScope ) { $rootScope . $broadcast ( 'someEvent' , [ 1 , 2 , 3 ]); } Finally, when you need to ...