Skip to main content

Posts

The Winner's Curse in A/B Testing: Why Your Biggest Lifts Are Probably Exaggerated

I've audited a lot of experimentation programs. The most common red flag isn't a low win rate. It's a suspiciously high one. If your team is consistently reporting 40%, 50%, or 60%+ win rates with lifts above 20% on your primary metric, something is probably wrong. Not "wrong" in the sense of fraud, but wrong in the statistical sense: you're almost certainly looking at the winner's curse. What the Winner's Curse Actually Is The winner's curse is not about bad luck. It's a mathematical outcome of running underpowered tests. Here's the mechanism: when a test is underpowered (say, 30% or 40% statistical power instead of the standard 80%), the test usually fails to detect a real effect. Most runs come back null. But occasionally, by chance, the noise in your data pushes the result over the significance threshold. When that happens, the observed lift is almost always an exaggeration of the true effect. The only way a small, underpowered tes...
Recent posts

Best AEO Tools in 2026: Top 5 Answer Engine Optimization Platforms Compared

If your brand isn't showing up in ChatGPT, Perplexity, or Google AI Overviews, you're missing a fast-growing slice of product discovery. 37% of product discovery queries now start inside AI interfaces, not search engines. Answer Engine Optimization (AEO) is the practice of fixing that, and you need the right tools to track it, measure it, and improve it. I've gone through the options available in 2026 and narrowed it down to the five that actually deliver. What to Look for in an AEO Tool Before picking a tool, know what you actually need. The AEO tool market splits into two buyer types: teams extending an existing SEO platform (Ahrefs, Semrush, SE Ranking) and teams buying a dedicated AI visibility platform (Profound, Scrunch, Otterly.AI). The core capabilities to check: Engine coverage: Which AI platforms does it monitor? ChatGPT, Perplexity, and Google AI Overviews are the minimum. Claude, Gemini, Copilot, and Grok are increasingly important. Citation and mention t...

n8n MCP in 2026: Three Ways to Connect AI Agents to Your Workflows (Compared)

If you're building AI agent workflows, n8n is no longer just a "webhook plus HTTP node" automation tool. As of late 2025, it has native Model Context Protocol support on both ends: it can call external MCP servers and expose its own workflows as MCP tools. That changes how you think about connecting AI agents to automation. Here are the three distinct ways you can wire n8n and MCP together, and where each one actually fits. Why MCP Matters for n8n Developers MCP (Model Context Protocol) , open-sourced by Anthropic in late 2024, became the de facto standard for AI-to-tool communication through 2025. The idea is simple: instead of hardcoding tool schemas into every AI app, you expose them through a standard JSON-RPC interface over SSE or streamable HTTP. Any MCP-compatible client, Claude, GPT-4o, Cursor, Windsurf, can discover and call those tools without custom integration code. n8n added two nodes that put it on both sides of this equation. The community announcement...

AEO Platform Breakdown: What Gets You Cited in ChatGPT vs Perplexity vs Google AI Overviews (2026)

Only 11% of domains cited by ChatGPT show up in Perplexity's answers too. That figure comes from an Averi analysis of 680 million AI citations published in March 2026. If you're running a single "AEO strategy" and calling it done, you're optimizing for one platform and leaving the other three on the table. I've been digging into this for client work at Publicis Sapient and the platform differences are bigger than most guides admit. Here's what each engine actually rewards. Why Platform-Specific AEO Matters Now Over 40% of search queries in 2026 go to AI assistants rather than traditional search engines. ChatGPT alone accounts for 87.4% of all AI referral traffic to brand websites. And 68% of consumers now start product research in ChatGPT or Perplexity before they visit a brand website at all. The problem is that these platforms don't pull from the same source pool. Each has a different retrieval architecture, different freshness requirements, and...

MCP Goes Stateless: Breaking Down the July 2026 Spec and What You Need to Change

The Model Context Protocol's next specification — release candidate locked May 21, 2026, final spec shipping July 28 — is the most significant protocol revision since MCP launched. If you're running MCP servers in any production context, the headline change is architectural: the stateful session layer is gone. I've been tracking the SEPs (Specification Enhancement Proposals) that make up this release. Here's the breakdown of what's actually changing and what you need to do before July 28. The Big Change: MCP Is Now Stateless The current spec requires an initialize / initialized handshake and tracks sessions via Mcp-Session-Id . That means sticky routing — every request mid-session must hit the same server instance that handled the handshake. For anyone running more than one server instance behind a load balancer, this has meant either session affinity configs, shared session stores, or both. The July 2026 spec eliminates all of that. No session handshake. No s...

Sample Ratio Mismatch: Why One in Ten A/B Tests Is Lying to You

A few years back I was consulting for a retail brand running a product page test in Adobe Target. The variant had bolder CTAs and tighter copy. After two weeks, the numbers looked... fine. Flat. The control won by a hair and the team was ready to call it and move on. Something felt off. The control had 52,000 sessions. The variant had 46,000. We'd set it to a 50/50 split. That 6,000-session gap shouldn't exist in a balanced allocation. We ran a chi-squared test. p-value: 0.0001. The test was broken. That was a sample ratio mismatch, and it had silently invalidated two weeks of data. What SRM Actually Is Sample ratio mismatch (SRM) happens when the observed visitor counts across variants don't match the ratio you configured. Set a 50/50 split and get 53/47 on 5,000 sessions? Might be noise. Get 53/47 on 100,000 sessions? Almost certainly not. Detection is a chi-squared goodness-of-fit test comparing observed counts against expected. Microsoft's ExP team uses a thr...

Building Private AI: How to Keep Your Data Local with OpenClaw

Cloud AI means your data goes to cloud providers. What if it didn't have to? Last week, I watched a developer paste an entire customer database into ChatGPT to "analyze patterns." The data left their computer, went to OpenAI's servers, got processed, and theoretically got deleted. Theoretically. That's not acceptable for most businesses. The Problem With Cloud AI When you use ChatGPT, Claude, or any cloud API: Your data leaves your control It gets transmitted over the internet A third party company stores and processes it They might train on it (check the terms) It's subject to their privacy policies and government data requests You lose all compliance guarantees For casual use? Maybe fine. For healthcare, finance, legal, or sensitive business data? Absolutely not. Why Private AI is Actually Better Local AI isn't a step backward. It's a step forward. Security Your data never leaves your servers. Period. No internet tr...