Skip to main content

Posts

Multi-Armed Bandits Are Not Smarter A/B Tests

Multi-armed bandits are an adaptive testing method that shifts traffic toward your best-performing variant as the test runs, rather than holding a fixed 50/50 split throughout. The idea is to minimize the cost of running a losing variant. The problem is that teams adopt them as an upgrade to A/B testing, and they're not: they're a different tool that trades statistical validity for short-term efficiency. If you're using MABs for product features, checkout flows, or anything you'll iterate on, you're probably getting cleaner-looking results that tell you less than you think. The Core Tradeoff You're Actually Making A/B tests assign traffic randomly. That randomness is the whole point. It's what lets you make causal claims. When you can say "I randomly assigned users to this variant, and they converted at a higher rate," you're not just observing a correlation. You've approximated a controlled experiment. MABs discard that guarantee. By ...
Recent posts

The Winner's Curse in A/B Testing: Why Your Biggest Lifts Are Probably Exaggerated

I've audited a lot of experimentation programs. The most common red flag isn't a low win rate. It's a suspiciously high one. If your team is consistently reporting 40%, 50%, or 60%+ win rates with lifts above 20% on your primary metric, something is probably wrong. Not "wrong" in the sense of fraud, but wrong in the statistical sense: you're almost certainly looking at the winner's curse. What the Winner's Curse Actually Is The winner's curse is not about bad luck. It's a mathematical outcome of running underpowered tests. Here's the mechanism: when a test is underpowered (say, 30% or 40% statistical power instead of the standard 80%), the test usually fails to detect a real effect. Most runs come back null. But occasionally, by chance, the noise in your data pushes the result over the significance threshold. When that happens, the observed lift is almost always an exaggeration of the true effect. The only way a small, underpowered tes...

Best AEO Tools in 2026: Top 5 Answer Engine Optimization Platforms Compared

If your brand isn't showing up in ChatGPT, Perplexity, or Google AI Overviews, you're missing a fast-growing slice of product discovery. 37% of product discovery queries now start inside AI interfaces, not search engines. Answer Engine Optimization (AEO) is the practice of fixing that, and you need the right tools to track it, measure it, and improve it. I've gone through the options available in 2026 and narrowed it down to the five that actually deliver. What to Look for in an AEO Tool Before picking a tool, know what you actually need. The AEO tool market splits into two buyer types: teams extending an existing SEO platform (Ahrefs, Semrush, SE Ranking) and teams buying a dedicated AI visibility platform (Profound, Scrunch, Otterly.AI). The core capabilities to check: Engine coverage: Which AI platforms does it monitor? ChatGPT, Perplexity, and Google AI Overviews are the minimum. Claude, Gemini, Copilot, and Grok are increasingly important. Citation and mention t...

n8n MCP in 2026: Three Ways to Connect AI Agents to Your Workflows (Compared)

If you're building AI agent workflows, n8n is no longer just a "webhook plus HTTP node" automation tool. As of late 2025, it has native Model Context Protocol support on both ends: it can call external MCP servers and expose its own workflows as MCP tools. That changes how you think about connecting AI agents to automation. Here are the three distinct ways you can wire n8n and MCP together, and where each one actually fits. Why MCP Matters for n8n Developers MCP (Model Context Protocol) , open-sourced by Anthropic in late 2024, became the de facto standard for AI-to-tool communication through 2025. The idea is simple: instead of hardcoding tool schemas into every AI app, you expose them through a standard JSON-RPC interface over SSE or streamable HTTP. Any MCP-compatible client, Claude, GPT-4o, Cursor, Windsurf, can discover and call those tools without custom integration code. n8n added two nodes that put it on both sides of this equation. The community announcement...

AEO Platform Breakdown: What Gets You Cited in ChatGPT vs Perplexity vs Google AI Overviews (2026)

Only 11% of domains cited by ChatGPT show up in Perplexity's answers too. That figure comes from an Averi analysis of 680 million AI citations published in March 2026. If you're running a single "AEO strategy" and calling it done, you're optimizing for one platform and leaving the other three on the table. I've been digging into this for client work at Publicis Sapient and the platform differences are bigger than most guides admit. Here's what each engine actually rewards. Why Platform-Specific AEO Matters Now Over 40% of search queries in 2026 go to AI assistants rather than traditional search engines. ChatGPT alone accounts for 87.4% of all AI referral traffic to brand websites. And 68% of consumers now start product research in ChatGPT or Perplexity before they visit a brand website at all. The problem is that these platforms don't pull from the same source pool. Each has a different retrieval architecture, different freshness requirements, and...

MCP Goes Stateless: Breaking Down the July 2026 Spec and What You Need to Change

The Model Context Protocol's next specification — release candidate locked May 21, 2026, final spec shipping July 28 — is the most significant protocol revision since MCP launched. If you're running MCP servers in any production context, the headline change is architectural: the stateful session layer is gone. I've been tracking the SEPs (Specification Enhancement Proposals) that make up this release. Here's the breakdown of what's actually changing and what you need to do before July 28. The Big Change: MCP Is Now Stateless The current spec requires an initialize / initialized handshake and tracks sessions via Mcp-Session-Id . That means sticky routing — every request mid-session must hit the same server instance that handled the handshake. For anyone running more than one server instance behind a load balancer, this has meant either session affinity configs, shared session stores, or both. The July 2026 spec eliminates all of that. No session handshake. No s...

Sample Ratio Mismatch: Why One in Ten A/B Tests Is Lying to You

A few years back I was consulting for a retail brand running a product page test in Adobe Target. The variant had bolder CTAs and tighter copy. After two weeks, the numbers looked... fine. Flat. The control won by a hair and the team was ready to call it and move on. Something felt off. The control had 52,000 sessions. The variant had 46,000. We'd set it to a 50/50 split. That 6,000-session gap shouldn't exist in a balanced allocation. We ran a chi-squared test. p-value: 0.0001. The test was broken. That was a sample ratio mismatch, and it had silently invalidated two weeks of data. What SRM Actually Is Sample ratio mismatch (SRM) happens when the observed visitor counts across variants don't match the ratio you configured. Set a 50/50 split and get 53/47 on 5,000 sessions? Might be noise. Get 53/47 on 100,000 sessions? Almost certainly not. Detection is a chi-squared goodness-of-fit test comparing observed counts against expected. Microsoft's ExP team uses a thr...