Open Knowledge Format (OKF): A knowledge layer for production-grade RAG systems

The Open Knowledge Format, OKF for short, is at its core nothing more than a directory of Markdown files with a YAML header. No tooling, no central authority, no SDK. And yet it hits a sore spot in almost every RAG system.

Because the real problem with large knowledge bases is rarely the model. It is the missing context about what a document actually is, what it belongs to, and what it refers to. This is exactly where OKF comes in, and exactly where a sober look pays off before declaring it a cure-all.

You will learn:

What OKF is, and what it explicitly is not
Where it sits in a RAG pipeline, as a layer, not a replacement
Through which mechanisms it reduces hallucination
Which limits and risks you have to plan for from the start

1. What is the Open Knowledge Format?

OKF is a deliberately minimal format for representing knowledge: the metadata, the context, and the curated framing around data and systems. It is built so that humans can read it without tools, agents can parse it without a special SDK, and organizations can exchange it.

Technically, a Knowledge Bundle is a directory tree of Markdown files. Each file is a Concept, a single unit of knowledge, introduced by a YAML header (frontmatter) and followed by free Markdown text.

Exactly one field is mandatory: type. It names the kind of Concept, for example BigQuery Table, API Endpoint, Metric or Playbook. Recommended but optional are:

Field	Purpose
`title`	Readable display name
`description`	One-liner that summarizes the Concept
`resource`	URI of the underlying resource (if any)
`tags`	Cross-cutting categorization
`timestamp`	Time of the last relevant change

On top of that come three conventions that make the bundle self-describing: index.md lists the contents of a directory (for step-by-step exploration), log.md records the change history, and plain Markdown links between Concepts express relationships.

The central design decision: OKF standardizes only the absolute minimum. Everything beyond that is left to the producer.

2. The crucial distinction: description, not content

This is the most common misunderstanding, and the point where many develop expectations that are too high.

An OKF Concept describes an asset: a table, an API, a process, a metric. It is metadata and curated context about things. It is not a transformation of the source document itself.

That has an uncomfortable consequence:

A 200-page contract does not become hallucination-proof just because an OKF card sits next to it describing it.

You still have to split (chunk), vectorize, and retrieve the contract content. OKF is a curated knowledge or catalog layer, not a document format and not a chunking format. It sits one level above the content, not in its place.

Whoever internalizes this asks the right question: not "does OKF replace my retrieval?", but "how does this layer make my existing retrieval better?".

3. Where OKF sits in the RAG pipeline

OKF belongs on top, as an enrichment and routing layer, not as a replacement for ingestion or retrieval. Concretely in two places:

In the ingestion phase. The OKF cards themselves become high-signal chunks: they are short, dense, and curated. In addition, their descriptions enrich the chunks of the underlying resource; the title, the one-liner, and the path within the bundle are prepended to each derived chunk.

In the retrieval phase. type and tags serve as structured metadata filters. The directory hierarchy together with index.md becomes a routing layer for agentic retrieval: first read the overview, then drill into the right Concept, then into the resource. And the cross-links form a graph that a system can navigate across several steps.

Pipeline stage	OKF contribution
Ingestion	Cards as chunks + enrichment of content chunks
Indexing	`type`/`tags` as filterable metadata
Retrieval	Hierarchy as routing, links as graph traversal
Generation	Sourced answers via the Citations convention

The rule of thumb: navigate, then retrieve instead of"embed everything and hope".

4. Why this helps against hallucination

The benefit is real, but indirect. OKF reduces hallucination through four mechanisms that anyone who has seriously evaluated a pipeline will recognize:

4.1 Better grounding per chunk

The frontmatter is effectively a hand-curated variant of Contextual Retrieval, enriching chunks with context before indexing. It directly addresses the classic failure mode where chunking loses the overarching context.

4.2 Routing instead of guessing

index.md and the hierarchy give the system a map. Instead of blindly searching the entire vector space, an agent can navigate directly to the right area: less scatter, less wrongly retrieved context.

4.3 Relationships as multi-hop help

The cross-links are an (untyped) knowledge graph, human-readable and versionable, a kind of GraphRAG-light. This helps with questions whose answer is spread across several documents, the typical multi-hop problem.

4.4 Citations as forced attribution

The Citations convention anchors statements to sources. Mandatory sourcing is one of the most effective anti-hallucination patterns there is.

5. The limits, stated honestly

Anyone using OKF in production has to plan realistically for four things.

It replaces neither chunking nor retrieval. Against hallucination from missing or poorly segmented content, OKF does nothing. The quality of your pipeline is still decided by chunking, reranking, and evaluation.

It drifts. This is the most important caveat. A layer that describes assets runs away from the assets as soon as they change. timestamp and log.md are manual hints, not a synchronization mechanism. Stale metadata produces confidently wrong answers, that is, potentially more hallucination, not less. This is the classic data catalog problem, and for a hand- or agent-maintained layer it is very real.

The interop claim has a gap. OKF wants to enable exchange across organizational boundaries, but deliberately omits a central registry for type values. Within a company this is uncritical. Across organizational boundaries, routing and filtering based on type becomes brittle.

It offers little commitment. OKF deliberately standardizes only a minimum, a single mandatory field, the rest is convention. That keeps the format flexible, but it also means there are hardly any hard guarantees that tools could rely on, and later versions may change conventions.

6. Framing: a pattern that is converging

Important for the strategic assessment: OKF is less an invention than the specification of a pattern that is converging from several directions, Markdown plus frontmatter as an agent-readable knowledge base.

You know the same principle from personal knowledge tools like Obsidian, from "metadata as code" approaches, from documentation generators, and from the SKILL.md files with which AI agents encapsulate their approach. This convergence is real, and it is the reason getting started is worthwhile.

The consequence for practice: adopt the pattern, not the brand. Use the convention internally without building a hard dependency on a single, minimally maintained specification.

7. Decision guidelines for leaders

If you want to put a knowledge layer over your RAG system:

Bet on the pattern, not the specific spec. Markdown plus frontmatter in Git is stable and platform-agnostic; a specific version of the specification is not.
Treat the layer as enrichment, not a replacement. It improves grounding and routing; it is not a retrieval miracle.
Plan against drift from day one. Clarify who or which process keeps the knowledge layer current, and how deviations from the source system are detected. A knowledge layer without a maintenance concept becomes a source of hallucination.
Communicate the benefit honestly. "Better grounding and clean routing" is a strong, defensible promise. "No more hallucinations" is not.
Measure it. Whether the layer helps is shown only by a quantitative evaluation against a baseline without it, not by a good feeling in the prototype.

8. Conclusion: curation is the value, not magic

The pattern behind OKF is spot on: anyone who wants to make large knowledge bases productively usable needs a self-describing, versionable layer over the raw data. OKF gives this pattern a name and a few sensible minimum rules.

But the value comes from curation, routing, and grounding, not from the format itself. And the biggest risk is not the technology, but the maintenance:

A knowledge layer is only as good as its currency.

Whoever takes this to heart builds exactly what makes the difference between a convincing demo and a productive system: an AI that does not just search documents, but knows what it is dealing with.

Rather build it yourself than just read about it? In the 3-day RAG workshop you design, implement, and evaluate a production-ready RAG pipeline yourself, from chunking and retrieval through Contextual Retrieval to quantitative evaluation. Exactly the building blocks a knowledge layer like OKF sensibly sits on.