Skip to main content

MiniMax M3: The Open-Weight Model That Beat GPT-5.5 on Coding for 8x Less

A spotlight beam illuminating a few selected tiles in a vast dark grid, representing sparse attention

MiniMax released M3 on June 1, 2026, and it's the first open-weight model to genuinely combine three things at once: frontier-level coding performance, a 1M-token context window, and native multimodal input. The interesting part isn't the feature list. It's the architectural trick that makes long-context inference practical at a fraction of what GPT-5.5 costs.

A New Way to Do Attention at Scale

Standard transformer attention scales quadratically with context length, which is why running a full 1M-token window at inference time is usually too expensive to be useful. MiniMax's answer is MSA (MiniMax Sparse Attention), and the mechanics are worth understanding.

Instead of computing attention over every token in the context, MSA uses a two-stage process. A lightweight index branch first scans incoming tokens and selects which blocks of the KV cache are actually relevant to the query. The main attention layer then processes only those selected blocks. MiniMax's numbers at 1M-token context: 9.7x faster prefill, 15.6x faster decode, and roughly 1/20th the per-token compute compared to their previous generation M2.

One design choice that stands out: MSA operates on uncompressed KV data, not compressed approximations. That preserves long-context retrieval accuracy better than methods that squash context down first. The tradeoff is higher memory. If you're running this on constrained hardware or in a memory-tight deployment, factor that in before committing.

What the Benchmark Numbers Actually Say

M3 scores 59.0% on SWE-Bench Pro, beating both GPT-5.5 and Gemini 3.1 Pro on that benchmark. That raised eyebrows at launch. TechTimes's June 1 headline read "Frontier Claims, Unverified Benchmarks." Seventeen days later, after the community ran independent evaluations, they updated to "Sparse Attention Architecture Now Verified."

That arc matters. The numbers held under scrutiny. But SWE-Bench Pro is a specific benchmark: it tests the ability to resolve real GitHub issues from software repos. It's meaningful if you're building agentic coding pipelines. It tells you little about broad reasoning, instruction following, or creative tasks. Claiming M3 beats GPT-5.5 across the board based on this one number would be wrong.

The Pricing Math

This is the part that's genuinely hard to dismiss. MiniMax M3 lists at $0.60 per million input tokens and $2.40 per million output tokens. GPT-5.5 is $5 per million input and $30 per million output. Claude Opus 4.7 is $5 input and $25 output.

On output tokens, M3 runs 10-12x cheaper than either. For workloads with high output volume (agentic loops, multi-step code generation, long-document summarization), that gap compounds fast. VentureBeat called it "5-10% of the cost", and the math roughly holds on output tokens compared to GPT-5.5.

A catch: the launch window included a 50% discount for new accounts in the first week. The prices I'm quoting are standard post-discount rates. Still a large gap, but not the extreme end of what some early benchmarks were run against.

What It Actually Supports

M3 handles image and video input natively, not via a separate vision module. It also includes built-in desktop computer operation. For agentic workflows that need visual context alongside long text, or that need to interact with desktop apps, having both in a single open-weight model is new.

I haven't stress-tested the 1M-token retrieval on adversarial inputs or measured the multimodal quality against dedicated vision models. The claim about retrieval accuracy from uncompressed KV data is theoretically sound, but theory and practice diverge on long-context tasks often enough that I'd want more community benchmarks before trusting it on critical retrieval paths.

Where This Actually Makes Sense

The cases where M3 earns consideration:

  • Inference-heavy agentic pipelines with long context and high output volume. At 10x cheaper output, you can run 10x more eval iterations, parallel agents, or retry loops for the same cost. The budget math changes meaningfully.
  • Teams that need open weights. Fine-tuning, air-gapped deployments, or wanting control over your own stack. M3 is currently the strongest open-weight option that doesn't force a tradeoff between context length, multimodal, and coding capability.

Where I'd slow down: don't route general-purpose reasoning to it based on the coding benchmark alone. Run evals on your actual task distribution first. Watch the memory ceiling if you're chasing the full 1M-context window in self-hosted setups.

MiniMax committed to releasing the full model weights and a technical report within 10 days of launch. If the report is as transparent as the architecture overview suggests, it'll be worth reading closely before you route production traffic.

The fact that community evaluation backed up the launch benchmarks is what pushes this from a press release to something that deserves a test run. That clears a higher bar than most model launches manage.

Comments

Popular posts from this blog

AngularJs call one method of controller in another controller .

I have seen many question about calling one method of one controller in another controller or extending scope of one controller in another controller.so here are the ways. if you want to call one controller into another or extending scope of controllers there are four methods available $rootScope.$emit() and $rootScope.$broadcast() If Second controller is child ,you can use Parent child communication . Use Services Kind of hack - with the help of angular.element() 1. $rootScope.$emit() and $rootScope.$broadcast() Controller and its scope can get destroyed, but the $rootScope remains across the application, that's why we are taking $rootScope because $rootScope is parent of all scopes . If you are performing communication from parent to child and even child wants to communicate with its siblings, you can use $broadcast If you are performing communication from child to parent ,no siblings invovled then you can use $rootScope.$emit HTML <body ng-app = ...

Closures in javascript and how do they work ?

JavaScript Closures for Dummies  Closures Are Not Magic This page explains closures so that a programmer can understand them — using working JavaScript code. It is not for gurus or functional programmers. Closures are  not hard  to understand once the core concept is grokked. However, they are impossible to understand by reading any academic papers or academically oriented information about them! This article is intended for programmers with some programming experience in a mainstream language, and who can read the following JavaScript function: function sayHello ( name ) { var text = 'Hello ' + name ; var sayAlert = function () { alert ( text ); } sayAlert (); } An Example of a Closure Two one sentence summaries: a closure is the local variables for a function — kept alive  after  the function has returned, or a closure is a stack-frame which is  not deallocated  when the function returns (as if a 'stack-fr...

Working with $scope.$emit , $scope.$broadcast and $scope.$on

First of all, parent-child scope relation does matter. You have two possibilities to emit some event: $broadcast  -- dispatches the event downwards to all child scopes, $emit  -- dispatches the event upwards through the scope hierarchy. If scope of  firstCtrl  is parent of the  secondCtrl  scope, your code should work by replacing  $emit  by  $broadcast  in  firstCtrl : function firstCtrl ( $scope ) { $scope . $broadcast ( 'someEvent' , [ 1 , 2 , 3 ]); } function secondCtrl ( $scope ) { $scope . $on ( 'someEvent' , function ( event , mass ) { console . log ( mass ); }); } In case there is no parent-child relation between your scopes you can inject  $rootScope  into the controller and broadcast the event to all child scopes (i.e. also  secondCtrl ). function firstCtrl ( $rootScope ) { $rootScope . $broadcast ( 'someEvent' , [ 1 , 2 , 3 ]); } Finally, when you need to ...