Skip to main content

OKF: Why Your Agent's Context Layer Is the Problem, Not Your Retrieval Strategy

A cartographer's hand drawing a precise map from scattered glowing document scraps

Every agent project I've built that touches internal data hits the same wall. The agent needs context: what is this BigQuery table, what do the columns mean, how does it join to the orders table, what's "monthly active users" in your org and not the textbook definition. You end up dumping SQL schemas into the system prompt, pointing at Confluence pages, writing a bespoke context builder that assembles fragments before each request. It works, barely, and it doesn't travel. Move to a different team's data, start a new project, and you're rebuilding it from scratch.

Google Cloud published a spec on June 12, 2026 that addresses exactly this: the Open Knowledge Format (OKF), v0.1. It formalizes what Andrej Karpathy called the "LLM wiki" into a portable, interoperable format.

What OKF Is (and What It Isn't)

OKF is not a service or a platform. It's a file format. The spec fits on a single page.

Your knowledge base is a directory of markdown files, one per concept. Each file has YAML frontmatter with a few structured fields and a markdown body with whatever prose, tables, or cross-links make sense for that concept.

---
type: BigQuery Table
title: Orders
description: One row per completed customer order.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, revenue]
timestamp: 2026-05-28T14:30:00Z
---

# Schema
| Column | Type | Description |
|--------|------|-------------|
| `order_id` | STRING | Globally unique order identifier. |
| `customer_id` | STRING | FK to [customers](/tables/customers.md). |

# Joins
Joined with [customers](/tables/customers.md) on `customer_id`.

That's the whole thing. Normal markdown links create the graph of relationships. An agent traverses them the same way a human would click through a wiki. No SDK, no proprietary API, no vendor account. Ship it as a tarball, host it in a git repo, mount it on a filesystem. The only mandatory field is type. Everything else is convention.

This Is Not a Retrieval Strategy

The "OKF vs RAG" framing is tempting but imprecise. RAG is a retrieval strategy. OKF is a knowledge representation format. They address different parts of the problem.

Here's the real issue with standard RAG over internal data: you chunk documents, embed them, and retrieve by vector similarity. The chunker doesn't know that customer_id on the orders table is a foreign key to the customers table. That relationship gets buried in embedding space and might come back with the right query, might not. The semantic similarity of "how do orders relate to customers" retrieves prose about the relationship, but not the structured fact that customer_id is the join key and it has nulls.

OKF makes that relationship explicit. The orders.md file links directly to customers.md. An agent can follow that link deterministically. You can still run vector search over OKF documents if you want, and you probably should for large bundles. But the underlying representation is structured and explicit rather than a bag of chunks.

The real comparisons are OKF against metadata catalogs (Datahub, Alation, Collibra) and against the bespoke context files most teams are already building. Against catalogs: no proprietary API, no vendor lock-in, lives in git, readable in any editor. Against bespoke context files: portable across projects and teams, interoperable across tools, versioned by default.

The Karpathy Connection

The LLM wiki idea came from a gist Karpathy published: represent your organizational knowledge as a wiki of markdown files that LLMs can read, update, and cross-reference. The key observation was that LLMs don't get bored, don't forget to update cross-references, and can touch 15 files in one pass.

The insight was right. What was missing was a standard format so the wiki you build for your BigQuery tables doesn't stay locked inside your organization's particular conventions. OKF is that format. Two teams at different companies can now produce OKF bundles that a third-party tool can consume without custom integration.

Google Cloud has already updated its Knowledge Catalog to ingest OKF and serve it to agents. There are two reference implementations: an enrichment agent that walks BigQuery datasets and drafts OKF concept documents automatically, and a static HTML visualizer that converts an OKF bundle into an interactive graph view with no backend required.

Should You Adopt It Now?

A few honest caveats.

OKF is v0.1. The spec is minimal by design, but that also means tooling is thin. The reference implementations cover BigQuery and static visualization. If your data lives in Snowflake, Postgres, or an internal API catalog, you're writing your own producer.

It's open but Google-originated. The governance question is real: who controls the spec evolution? Right now the repo lives under GoogleCloudPlatform. That doesn't make it a trap, but it's worth watching whether other vendors like dbt or Datahub participate or build competing formats instead.

That said, the format is simple enough that even if OKF as a branded spec doesn't reach broad adoption, the underlying pattern is worth using today. A directory of markdown files with YAML frontmatter, hosted in git, describing your data concepts and linking them explicitly. That's low-risk to implement and immediately useful even without any OKF-aware tooling.

I'd start by picking one data domain, one BigQuery dataset or one set of business metrics, and writing OKF documents for it manually. See how agents perform with that structured context versus raw schema dumps in the system prompt. My guess is the difference is noticeable, and the format itself imposes useful discipline on the people writing the documentation, which is half the value.

Comments

Popular posts from this blog

AngularJs call one method of controller in another controller .

I have seen many question about calling one method of one controller in another controller or extending scope of one controller in another controller.so here are the ways. if you want to call one controller into another or extending scope of controllers there are four methods available $rootScope.$emit() and $rootScope.$broadcast() If Second controller is child ,you can use Parent child communication . Use Services Kind of hack - with the help of angular.element() 1. $rootScope.$emit() and $rootScope.$broadcast() Controller and its scope can get destroyed, but the $rootScope remains across the application, that's why we are taking $rootScope because $rootScope is parent of all scopes . If you are performing communication from parent to child and even child wants to communicate with its siblings, you can use $broadcast If you are performing communication from child to parent ,no siblings invovled then you can use $rootScope.$emit HTML <body ng-app = ...

Closures in javascript and how do they work ?

JavaScript Closures for Dummies  Closures Are Not Magic This page explains closures so that a programmer can understand them — using working JavaScript code. It is not for gurus or functional programmers. Closures are  not hard  to understand once the core concept is grokked. However, they are impossible to understand by reading any academic papers or academically oriented information about them! This article is intended for programmers with some programming experience in a mainstream language, and who can read the following JavaScript function: function sayHello ( name ) { var text = 'Hello ' + name ; var sayAlert = function () { alert ( text ); } sayAlert (); } An Example of a Closure Two one sentence summaries: a closure is the local variables for a function — kept alive  after  the function has returned, or a closure is a stack-frame which is  not deallocated  when the function returns (as if a 'stack-fr...

250,000 AI Agent Instances Exposed on the Internet — Is Yours One of Them?

If You're Running OpenClaw, You May Want to Read This A public watchboard has surfaced listing over 250,000 OpenClaw instances that are directly reachable from the internet. Some of these instances have leaked credentials. Many are running on infrastructure already flagged for known CVEs and threat actor activity. This isn't theoretical. It's happening right now. You can check the exposure list yourself at openclaw.allegro.earth . Why This Is a Big Deal OpenClaw is a powerful AI agent framework. That power comes with serious responsibility. A typical OpenClaw deployment runs with: Personal API keys — OpenAI, Anthropic, Google, cloud provider credentials Broad system permissions — file access, shell execution, network requests Autonomous execution capabilities — the agent can act without human approval Complex codebases — large attack surfaces that haven't been fully audited When one of these instances is publicly reachable without authentication...