Skip to main content

Posts

Showing posts with the label LLM

n8n MCP in 2026: Three Ways to Connect AI Agents to Your Workflows (Compared)

If you're building AI agent workflows, n8n is no longer just a "webhook plus HTTP node" automation tool. As of late 2025, it has native Model Context Protocol support on both ends: it can call external MCP servers and expose its own workflows as MCP tools. That changes how you think about connecting AI agents to automation. Here are the three distinct ways you can wire n8n and MCP together, and where each one actually fits. Why MCP Matters for n8n Developers MCP (Model Context Protocol) , open-sourced by Anthropic in late 2024, became the de facto standard for AI-to-tool communication through 2025. The idea is simple: instead of hardcoding tool schemas into every AI app, you expose them through a standard JSON-RPC interface over SSE or streamable HTTP. Any MCP-compatible client, Claude, GPT-4o, Cursor, Windsurf, can discover and call those tools without custom integration code. n8n added two nodes that put it on both sides of this equation. The community announcement...

MCP Goes Stateless: Breaking Down the July 2026 Spec and What You Need to Change

The Model Context Protocol's next specification — release candidate locked May 21, 2026, final spec shipping July 28 — is the most significant protocol revision since MCP launched. If you're running MCP servers in any production context, the headline change is architectural: the stateful session layer is gone. I've been tracking the SEPs (Specification Enhancement Proposals) that make up this release. Here's the breakdown of what's actually changing and what you need to do before July 28. The Big Change: MCP Is Now Stateless The current spec requires an initialize / initialized handshake and tracks sessions via Mcp-Session-Id . That means sticky routing — every request mid-session must hit the same server instance that handled the handshake. For anyone running more than one server instance behind a load balancer, this has meant either session affinity configs, shared session stores, or both. The July 2026 spec eliminates all of that. No session handshake. No s...

Building Private AI: How to Keep Your Data Local with OpenClaw

Cloud AI means your data goes to cloud providers. What if it didn't have to? Last week, I watched a developer paste an entire customer database into ChatGPT to "analyze patterns." The data left their computer, went to OpenAI's servers, got processed, and theoretically got deleted. Theoretically. That's not acceptable for most businesses. The Problem With Cloud AI When you use ChatGPT, Claude, or any cloud API: Your data leaves your control It gets transmitted over the internet A third party company stores and processes it They might train on it (check the terms) It's subject to their privacy policies and government data requests You lose all compliance guarantees For casual use? Maybe fine. For healthcare, finance, legal, or sensitive business data? Absolutely not. Why Private AI is Actually Better Local AI isn't a step backward. It's a step forward. Security Your data never leaves your servers. Period. No internet tr...

Building Trustworthy AI: Beyond Benchmarks

Last month I was evaluating three frontier models for a client workflow at Publicis Sapient. One of them scored highest on every benchmark we checked. It was also the one that fell apart in production within two weeks. That experience pushed me to write this down, because I think the industry has a benchmark problem it isn't talking about honestly enough. Benchmarks Are Saturated and Getting Gamed MMLU and MMLU-Pro, two of the most cited evaluation benchmarks, are now functionally saturated above 88% for frontier models. The score differences between the top models are statistically meaningless at that level. Meanwhile, data contamination and annotation error rates above 50% undermine what these scores even measure in the first place. It gets worse. Most teams building internal benchmarks overestimate how well their models perform by 30% or more, because they test on clean inputs, cooperative conditions, and scenarios where the model's known strengths are on display. Tha...

From Single API to Network Intelligence: How Request Scout Changed My Debugging Workflow

A story about building smarter dev tools with your own AI The Moment It Clicked Last week, I was debugging a performance issue on a client's site. I had Chrome DevTools open, watching the Network tab with hundreds of requests flying by. I had Gemini in another window, pasting individual API responses, asking "What's this endpoint doing?" and "Are these headers correct?" Then it hit me: Why am I talking to Gemini about one API at a time, when I should be talking to it about my entire network? I opened my notebook and sketched something radical: What if I could ask my network tab questions directly? "Show me all failed API calls" "Which domains took longest?" "Find any requests with leaked auth tokens" "What's the pattern here?" Three days later, Request Scout was born. The Problem Nobody Talks About Network debugging is fragmented: 🕵️ You stare at the Network tab (human-scale = ~100 requests, max) 🔍...

Claude 3.7 vs GPT-5.2: Which LLM Wins for Production?

I ran every benchmark. Here are the results that surprised me. Last month, I made it my mission to test both Claude 3.7 and GPT-5.2 across real-world production scenarios. Not just benchmarks—actual work: code generation, reasoning, document analysis, customer support automation. What I found was more nuanced than "one is better." Here's what actually matters. The Benchmarks Everyone Quotes Claude 3.7 scores higher on MMLU (87.2% vs 86.8%). GPT-5.2 wins on reasoning tasks by a narrow margin. On the surface, GPT-5.2 looks better. But benchmarks lie in interesting ways. MMLU tests multiple choice knowledge. It doesn't test what matters in production: streaming latency, cost per token, context window usage, and most importantly—reliability on your specific tasks. Real-World Testing Code Generation (JavaScript/Python) I generated 100 functions across varying complexity levels. Claude 3.7: 87% passed tests on first try. Generated code was clean, ...

The Death of Prompt Engineering: Why AI Agents Are Taking Over

Prompt engineering isn't completely dead. If you're summarizing emails or classifying support tickets, a well-written system prompt still gets you 90% of the way there. But if you're building agents, and most of us are building agents now, prompt engineering is the wrong mental model. It's not the bottleneck anymore. The system around the model is. What Prompt Engineering Actually Was From 2022 to 2024, most AI work was "how do I phrase this to get better output." Few-shot examples, chain-of-thought prompting, temperature tuning. The skill was about poking a stateless model and getting a useful single-turn response. That made sense when models were barely reliable and the main interface was a text completion box. The best prompt engineers I knew were essentially UX designers for language models. The craft was real. But it was always a workaround for a gap between what models could do and what they needed to do. As models improved and workflows got mor...