OKF: Why Your Agent's Context Layer Is the Problem, Not Your Retrieval Strategy

A cartographer's hand drawing a precise map from scattered glowing document scraps

Every agent project I've built that touches internal data hits the same wall. The agent needs context: what is this BigQuery table, what do the columns mean, how does it join to the orders table, what's "monthly active users" in your org and not the textbook definition. You end up dumping SQL schemas into the system prompt, pointing at Confluence pages, writing a bespoke context builder that assembles fragments before each request. It works, barely, and it doesn't travel. Move to a different team's data, start a new project, and you're rebuilding it from scratch.

Google Cloud published a spec on June 12, 2026 that addresses exactly this: the Open Knowledge Format (OKF), v0.1. It formalizes what Andrej Karpathy called the "LLM wiki" into a portable, interoperable format.

What OKF Is (and What It Isn't)

OKF is not a service or a platform. It's a file format. The spec fits on a single page.

Your knowledge base is a directory of markdown files, one per concept. Each file has YAML frontmatter with a few structured fields and a markdown body with whatever prose, tables, or cross-links make sense for that concept.

---
type: BigQuery Table
title: Orders
description: One row per completed customer order.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, revenue]
timestamp: 2026-05-28T14:30:00Z
---

# Schema
| Column | Type | Description |
|--------|------|-------------|
| `order_id` | STRING | Globally unique order identifier. |
| `customer_id` | STRING | FK to [customers](/tables/customers.md). |

# Joins
Joined with [customers](/tables/customers.md) on `customer_id`.

That's the whole thing. Normal markdown links create the graph of relationships. An agent traverses them the same way a human would click through a wiki. No SDK, no proprietary API, no vendor account. Ship it as a tarball, host it in a git repo, mount it on a filesystem. The only mandatory field is type. Everything else is convention.

This Is Not a Retrieval Strategy

The "OKF vs RAG" framing is tempting but imprecise. RAG is a retrieval strategy. OKF is a knowledge representation format. They address different parts of the problem.

Here's the real issue with standard RAG over internal data: you chunk documents, embed them, and retrieve by vector similarity. The chunker doesn't know that customer_id on the orders table is a foreign key to the customers table. That relationship gets buried in embedding space and might come back with the right query, might not. The semantic similarity of "how do orders relate to customers" retrieves prose about the relationship, but not the structured fact that customer_id is the join key and it has nulls.

OKF makes that relationship explicit. The orders.md file links directly to customers.md. An agent can follow that link deterministically. You can still run vector search over OKF documents if you want, and you probably should for large bundles. But the underlying representation is structured and explicit rather than a bag of chunks.

The real comparisons are OKF against metadata catalogs (Datahub, Alation, Collibra) and against the bespoke context files most teams are already building. Against catalogs: no proprietary API, no vendor lock-in, lives in git, readable in any editor. Against bespoke context files: portable across projects and teams, interoperable across tools, versioned by default.

The Karpathy Connection

The LLM wiki idea came from a gist Karpathy published: represent your organizational knowledge as a wiki of markdown files that LLMs can read, update, and cross-reference. The key observation was that LLMs don't get bored, don't forget to update cross-references, and can touch 15 files in one pass.

The insight was right. What was missing was a standard format so the wiki you build for your BigQuery tables doesn't stay locked inside your organization's particular conventions. OKF is that format. Two teams at different companies can now produce OKF bundles that a third-party tool can consume without custom integration.

Google Cloud has already updated its Knowledge Catalog to ingest OKF and serve it to agents. There are two reference implementations: an enrichment agent that walks BigQuery datasets and drafts OKF concept documents automatically, and a static HTML visualizer that converts an OKF bundle into an interactive graph view with no backend required.

Should You Adopt It Now?

A few honest caveats.

OKF is v0.1. The spec is minimal by design, but that also means tooling is thin. The reference implementations cover BigQuery and static visualization. If your data lives in Snowflake, Postgres, or an internal API catalog, you're writing your own producer.

It's open but Google-originated. The governance question is real: who controls the spec evolution? Right now the repo lives under GoogleCloudPlatform. That doesn't make it a trap, but it's worth watching whether other vendors like dbt or Datahub participate or build competing formats instead.

That said, the format is simple enough that even if OKF as a branded spec doesn't reach broad adoption, the underlying pattern is worth using today. A directory of markdown files with YAML frontmatter, hosted in git, describing your data concepts and linking them explicitly. That's low-risk to implement and immediately useful even without any OKF-aware tooling.

I'd start by picking one data domain, one BigQuery dataset or one set of business metrics, and writing OKF documents for it manually. See how agents perform with that structured context versus raw schema dumps in the system prompt. My guess is the difference is noticeable, and the format itself imposes useful discipline on the people writing the documentation, which is half the value.

ngLover

Search This Blog