Skip to main content

OpenAI's Jalapeño Chip: Nine Months to Custom Silicon and What the 50% Cost Claim Really Means

Custom semiconductor chip die under amber spotlight on polished steel surface

OpenAI just announced Jalapeño, its first custom inference processor, built in partnership with Broadcom and taped out in just nine months. If the cost numbers hold, this is a structural shift in how OpenAI runs its models, and it eventually affects what builders pay to call the API.

What Jalapeño Actually Is

Jalapeño is an inference-only ASIC (application-specific integrated circuit). Not a training chip. Inference is what runs every time you call gpt-4o or o3. That's where the compute costs actually land at scale.

The chip is built on TSMC's 3nm process node, the same manufacturing tier Apple uses for its A18 Pro. It's a reticle-sized die, meaning it's about as large as a chip can physically be before yield becomes a serious problem at that node. The package includes one large compute chiplet surrounded by eight HBM (high-bandwidth memory) stacks. HBM is what you need for LLM inference: huge memory bandwidth, physically close to the compute. GPUs do this too, but a purpose-built ASIC strips out everything a GPU needs for general graphics and puts that die area and power budget toward memory bandwidth and matrix multiply throughput.

OpenAI says engineering samples are already running at target clock speed and handling ML workloads including GPT-5.3-Codex-Spark. The Broadcom announcement confirms prototype deployments are planned for late 2026, scaling alongside Microsoft for gigawatt-scale data centers.

The 50% Cost Claim, and Why to Read It Carefully

OpenAI claims Jalapeño delivers roughly 50% lower cost per inference token versus current GPU alternatives, and "substantially better" performance per watt. A few things to note before taking that at face value.

First, this is OpenAI's own benchmark, run against workloads of their choosing, with no disclosed comparison baseline. Is it vs H100s? H200s? GB200 NVL72 racks? The framing matters. A purpose-built inference ASIC can absolutely outperform general-purpose GPUs by eliminating the overhead that GPUs carry for graphics and general compute. But 50% is a specific number that needs external validation.

Second, this is an inference chip only. Training still runs on GPUs, and that's where the biggest compute bills accumulate. OpenAI is not escaping NVIDIA dependence. They're carving out the inference workload where they have the most control over the workload shape.

Third, the chip is internal only and won't be sold to external customers. It reduces OpenAI's own cost structure, which could eventually flow into API pricing, but there's no direct mechanism at launch. Your API calls won't run on Jalapeño this year.

Why Every Large Lab Is Building Custom Silicon

This is not new. Google has been running Tensor Processing Units since 2015 and now controls roughly a quarter of global AI compute outside NVIDIA's supply chain. Amazon has shipped over a million Trainium chips. Meta has MTIA. Microsoft has Maia. Every large lab at scale eventually builds custom silicon, and none of them replace NVIDIA outright. They run custom chips for the workloads they can tightly control and still buy NVIDIA for everything else.

What's notable about OpenAI is the timeline. Nine months from blank-slate design to tape-out is fast for a chip program. Broadcom has done this kind of work before (Google's early TPU development was also a Broadcom collaboration), so they know the process. But a nine-month ASIC cycle still requires that you know your workload extremely well upfront, because you cannot change hardware mid-build. OpenAI has been running GPT-scale inference for three years. They know what their matmul shapes look like.

The partnership structure is also worth noting. Broadcom designs the chip, TSMC fabricates it. OpenAI funds it and owns the resulting silicon. The money flows to Broadcom and TSMC, not NVIDIA. That's intentional.

What Actually Changes for Builders

Honestly, not much in the short term. Production scale doesn't arrive until 2027 at earliest. Your API calls run on NVIDIA hardware for now.

The longer arc is more interesting. If Jalapeño works and OpenAI's cost structure improves, they have more room to price inference competitively. Cost per token has already dropped dramatically over the past two years, through model efficiency gains and infrastructure work. Custom silicon is the next lever, and it's one OpenAI now controls rather than waiting on NVIDIA supply allocation.

There's also a reliability angle. OpenAI's inference capacity today is partly constrained by how many chips NVIDIA can deliver on their timeline. Custom silicon means OpenAI can plan its own production roadmap. That's better for capacity predictability long-term, though I haven't seen specific SLA commitments come out of this announcement.

I haven't run any workloads on Jalapeño. No one outside OpenAI has. The 50% figure is a marketing number until the chip is in production and someone runs independent benchmarks. But building inference ASICs to reduce per-token cost is the right structural move for any lab running at this scale. The only real question is whether nine months was enough runway to get the workload assumptions right before tape-out locked them in.

Comments

Popular posts from this blog

AngularJs call one method of controller in another controller .

I have seen many question about calling one method of one controller in another controller or extending scope of one controller in another controller.so here are the ways. if you want to call one controller into another or extending scope of controllers there are four methods available $rootScope.$emit() and $rootScope.$broadcast() If Second controller is child ,you can use Parent child communication . Use Services Kind of hack - with the help of angular.element() 1. $rootScope.$emit() and $rootScope.$broadcast() Controller and its scope can get destroyed, but the $rootScope remains across the application, that's why we are taking $rootScope because $rootScope is parent of all scopes . If you are performing communication from parent to child and even child wants to communicate with its siblings, you can use $broadcast If you are performing communication from child to parent ,no siblings invovled then you can use $rootScope.$emit HTML <body ng-app = ...

Closures in javascript and how do they work ?

JavaScript Closures for Dummies  Closures Are Not Magic This page explains closures so that a programmer can understand them — using working JavaScript code. It is not for gurus or functional programmers. Closures are  not hard  to understand once the core concept is grokked. However, they are impossible to understand by reading any academic papers or academically oriented information about them! This article is intended for programmers with some programming experience in a mainstream language, and who can read the following JavaScript function: function sayHello ( name ) { var text = 'Hello ' + name ; var sayAlert = function () { alert ( text ); } sayAlert (); } An Example of a Closure Two one sentence summaries: a closure is the local variables for a function — kept alive  after  the function has returned, or a closure is a stack-frame which is  not deallocated  when the function returns (as if a 'stack-fr...

Working with $scope.$emit , $scope.$broadcast and $scope.$on

First of all, parent-child scope relation does matter. You have two possibilities to emit some event: $broadcast  -- dispatches the event downwards to all child scopes, $emit  -- dispatches the event upwards through the scope hierarchy. If scope of  firstCtrl  is parent of the  secondCtrl  scope, your code should work by replacing  $emit  by  $broadcast  in  firstCtrl : function firstCtrl ( $scope ) { $scope . $broadcast ( 'someEvent' , [ 1 , 2 , 3 ]); } function secondCtrl ( $scope ) { $scope . $on ( 'someEvent' , function ( event , mass ) { console . log ( mass ); }); } In case there is no parent-child relation between your scopes you can inject  $rootScope  into the controller and broadcast the event to all child scopes (i.e. also  secondCtrl ). function firstCtrl ( $rootScope ) { $rootScope . $broadcast ( 'someEvent' , [ 1 , 2 , 3 ]); } Finally, when you need to ...