World Models

Apr 9, 2026

Featured

What the core of an organisation actually looks like, and why today's AI knowledge tools are not adequate for building it.

This is the fourth in a series of essays about what is changing and what to do about it. The Interregnum asked what happens to people when AI displaces cognitive work. The Middle asked who controls the architecture between the model and the work. Minimum Viable Company asked what the irreducible core of an organisation is when the layers around it can be automated. This essay is about the thing those questions have been circling: what does that core actually look like, and why aren’t the tools we have today adequate for building it?

The examples here start with software because that is where so much of the bleeding edge change in AI is happening and where the problems are most visible. But the ideas apply to any domain where knowledge matters more than information. By the end of the essay, it will be clear why.

The documentation nobody reads

Here is a situation that will be familiar to anyone who has built something complex and tried to explain it.

You spend months building a design system. Not just the components but the thinking behind them: the spacing scale, the composition model, the accessibility requirements, the API conventions, the migration paths from the old system. You write it all down. You write it well. Component documentation, usage guidelines, decision records, the reasoning behind the choices. It is thorough and considered and almost nobody reads it.

This is not because the documentation is bad. It is because documentation is a lossy compression of the understanding that produced it, and reading it is not the same experience as having built the thing. The people who built the system understand it in a way that the docs can only gesture toward. The people who need to use the system want answers, not context. The documentation sits in the middle, too detailed for the casual question and too shallow for the deep one.

So you do the obvious thing. You put the docs into NotebookLM, or a RAG pipeline, or whatever AI-powered knowledge tool your company has adopted. Now people can ask questions in natural language and get answers grounded in your documentation. And it works. “What’s the spacing scale?” gets a correct answer. “Which component should I use for a modal?” gets a reasonable one.

Then someone asks a harder question. “Why did we choose this composition model instead of the one React uses?” The tool produces a fluent, confident answer. It pulls relevant passages from your decision records. It assembles them into something that reads like an explanation. And the answer might even be correct, or close to correct. But it arrived at the answer through statistical proximity of words in your documents, not through any understanding of the design decision it’s describing. It found passages that were near the question in semantic space. It did not reason about the problem.

The difference is invisible when the question is simple and the answer is in the docs. The difference becomes visible the moment someone follows that answer into a real decision, one that depends on context the documents don’t contain. Why this API surface has the shape it does. Which constraint in the consuming applications will become obvious in six months. Why a particular naming convention exists and what agreement it represents between two teams that never wrote the agreement down. The real knowledge of the system lives in these relationships, and none of them survive being chunked into vectors and retrieved by cosine similarity.

The map and the territory

This is not a problem with any specific tool. It is a problem with the entire category.

Every AI knowledge tool on the market today, from enterprise RAG systems to curated workspaces like NotebookLM, operates on the same fundamental assumption: that text is the unit of knowledge. You feed it documents. It processes the documents. It answers questions by retrieving and synthesising text. The quality varies. The retrieval can be better or worse. The synthesis can be more or less faithful. But the substrate is always text, and text is not knowledge. Text is a representation of knowledge, in the same way that a map is a representation of territory.

A map can be useful. A map can be accurate. A map can answer questions about the territory it describes. But a map does not understand geography. It does not know that this river floods in spring. It does not know that this road was built to bypass a town that no longer exists. It does not know that the bridge marked on this page was condemned last year and the only way across is a detour through a farm road that nobody has mapped. The map is a snapshot of someone’s understanding at the time of drawing, and the territory has moved on.

RAG is a sophisticated map reader. It can find the right page, identify the relevant features, and describe what it sees with remarkable fluency. What it cannot do is look out the window and notice that the bridge is gone. It operates on the representation, not the reality, and the gap between them is exactly where the important questions live.

This matters more than it might seem, because the fluency hides the gap. A bad answer to a hard question is easy to identify. A fluent, plausible, almost-right answer to a hard question is much more dangerous, because it feels like understanding. The user stops checking. The user stops building their own understanding of the system, because the tool provides answers that are good enough, and the slow erosion that The Interregnum described in the context of careers, the atrophy of judgment that comes from not exercising it, proceeds quietly underneath the productivity gains.

Knowledge is not information

The distinction that matters, the one that the current tooling collapses almost entirely, is between information and knowledge.

Information is a document. A spreadsheet. A codebase. A Slack thread. A collection of artifacts that describe some aspect of the work. Information can be stored, retrieved, copied, and fed into a model.

Knowledge is the structured understanding of how the pieces relate to each other. Why they are the way they are. What depends on what. Where the contradictions live. What the consequences are of changing one thing rather than another. Knowledge is what a senior engineer has after ten years on a codebase: not the ability to find a file, but the understanding of why that file exists, what it replaced, what breaks if you change it, and what the original author was trying to solve when they wrote it the way they did.

You cannot retrieve knowledge from a document store. You can only build it, over time, through contact with the problem. The tools we have today are very good at retrieving information and presenting it as though it were knowledge. The confidence of the presentation obscures the absence of understanding, and the user, who is busy and whose attention is finite, accepts the presentation at face value because it is easier than building the understanding themselves.

This is the error that everything else follows from. Not that the tools are bad. They are often good. Not that the answers are wrong. They are often right. But that the relationship between the user and the tool is one of deference rather than use, and the thing being deferred to does not understand the thing it is being asked about.

What the substrate needs to be

If knowledge is not information, and documents are not knowledge, then a knowledge system cannot be built on documents. It needs a different substrate.

Go back to the design system. The real knowledge, the knowledge that the documentation couldn’t capture and the RAG system couldn’t retrieve, was structural. It was about relationships: this component exists because that pattern failed. This API is shaped by a constraint in the consuming applications. This naming convention is a compromise between two teams. These are not facts that live in a paragraph. They are edges in a graph. They connect things to other things, and the connections are the knowledge.

For code, this means parsing at the level of structure rather than text. Not embedding files as vectors but parsing them into their actual components: functions, types, modules, dependencies, call graphs, data flows. Storing those components and their relationships in a way that preserves the topology of the system rather than flattening it into a searchable index. Understanding that a function calls another function, that a type is used across three services, that this module was introduced to replace that one, that this dependency is load-bearing and that one is vestigial.

For organisations more broadly, the principle is the same. The knowledge of how decisions get made, what processes actually are as opposed to what the handbook says they are, where expertise lives, what the real constraints are, all of this is structural. It is a graph of relationships, not a collection of texts. The tools that treat it as a collection of texts will always produce answers that are fluent and plausible and wrong in exactly the places where being wrong is expensive.

The properties of a real knowledge substrate follow from this:

It is structural, not textual. It represents the relationships between things, not descriptions of things.

It is active, not passive. This is the property that no current tool provides and that matters most. Every knowledge system on the market today is a library with guests. The library holds the information. The model comes in, browses the shelves, pulls down what looks relevant, and assembles an answer. The guest might be fast and fluent but it has no understanding of the collection. It has never read everything. It has no formed views. It cannot tell you that two books on adjacent shelves contradict each other in a way that matters, because it has never thought about either of them until you asked.

What you actually want is a librarian. Not someone who can find a book, but someone who has read them all, thought about them, formed views, identified the gaps, and can answer not just from recall but from understanding. The librarian’s knowledge is not a copy of the library. It is both smaller, condensed, because you don’t need every detail, and larger, expanded, because understanding exists in the connections between sources that no single source contains. The relationship between two books, the contradiction, the evolution of an idea across a decade of papers, the gap that no one has written about yet, that is knowledge, and it lives in the librarian, not in the shelves.

Models are still the guests in this picture. They still ask questions. But instead of browsing the shelves themselves and assembling an answer from fragments, they talk to the librarian, who understands. The substrate is the librarian’s mind: the structured, evolved, judgment-informed understanding of everything in the collection and everything between it.

This is not a new idea. It is how every organisation that has ever worked well has operated. The person at the lowest level knows the intimate details of their domain. The person above them sees across multiple domains and understands something different, something that exists only at that level of synthesis. The person above them sees further still. Each layer is not just passing information upward. It is transforming information into knowledge through the act of seeing across and forming judgment. That is not a workflow. That is cognition. And it is exactly what Minimum Viable Company described as the reservoir that gets lost when you flatten the hierarchy and replace it with a model that can summarise but cannot understand.

When new information arrives, the substrate evaluates it. Is this consistent with what is already known? Does it contradict something? Does it fill a gap or create a new one? What are the implications for decisions that have already been made? The output of that process is not a stored document or an updated index. It is an updated understanding, and the difference between those things is the difference between a filing cabinet and a mind.

This means the substrate knows what it knows. It knows where its understanding is strong and where it is thin. It can identify that two sources contradict each other and surface the contradiction with a view about what the resolution might be, rather than simply retrieving both and leaving the user to notice. It can recognise that a new piece of information invalidates a decision that was made six months ago and flag it before anyone asks. This is not retrieval. This is the structural scaffolding that makes real reasoning possible, and it is what people actually want when they say they want AI that understands their work. Not a faster search engine. Not a more fluent summariser. Real intelligence. Real expertise. The thing that has always lived in people and that no tool has yet attempted to build properly.

It is maintained, not static. It evolves as the system it represents evolves, because systems always drift from their descriptions and the drift is where the real knowledge lives.

It is queryable by models but not dependent on them. The model is a lens for looking at the substrate, not the substrate itself. If the model changes or degrades, the knowledge persists.

And it is yours. Not rented from a provider. Not stored in someone else’s infrastructure. Not subject to someone else’s deprecation schedule. The knowledge of your system is the most valuable thing you have, and as The Middle argued, handing it to a platform you don’t control is handing over the thing that makes you you.

Better instructions are still instructions

There is a growing community of practitioners who have correctly identified this gap and are trying to close it. They are converting their documentation into decision records, distilling institutional knowledge into skill files and behavioural guidelines, building pipelines that transform unstructured information into structured instructions and store them in repositories where agents can access them. The work is often thoughtful and the instinct behind it is exactly right: agents need knowledge, not just information, and the knowledge needs to be curated rather than dumped.

But the result is still text. Better text. More carefully structured text. Text that has been designed to inform decisions rather than merely describe things. And it is fed into the same context window as everything else, where it competes for attention with the prompt and the code and whatever else the model is holding. The model treats a decision record the same way it treats any other token sequence: as something to attend to statistically, not as a structure to reason about. Sometimes the agent picks up the right instruction and applies it well. Sometimes it doesn’t. And the path from instruction to behaviour runs through a statistical process that you cannot inspect, which means you cannot trace why the agent did what it did, which means you cannot reliably improve it.

This is the measurability problem that almost nobody talks about. When your knowledge is structural, when it lives in a graph of relationships with explicit edges and documented dependencies, you can trace a decision backward. Why did the agent choose this approach? Because this component depends on that service, which has this constraint, which was introduced for this reason. The path is inspectable. You can debug it. You can improve it. You can know, rather than hope, that the right knowledge informed the decision.

When your knowledge is a set of natural language instructions in a context window, the path is opaque. The agent attended to certain tokens more than others for reasons that are a function of the model’s weights, not your intentions. You can tune the instructions. You can rewrite them. You can add more of them. But you are optimising a system you cannot see into, and the feedback loop between what you intended and what the agent did is mediated by a process that neither you nor the model can explain.

The people building these systems are doing important work and they are closer to the answer than the people who are still just feeding raw documents into a chat interface. But they are building knowledge out of the wrong material. Natural language is a communication medium. It is how humans transfer understanding to other humans. It is not a substrate for machine reasoning, and using it as one means accepting a fundamental opacity in the system you are building.

The model’s role

But models are not irrelevant to the substrate. They are essential to it, just not in the way the current tooling assumes.

In a RAG system, the model does everything: retrieval, reasoning, and response, all in one pass, all fighting for the same context window. In a knowledge substrate, models are embedded throughout as thinking processes. A model participates in ingestion: when new information arrives, it helps assess what changed, what the implications are, how it relates to what is already known. A model participates in organisation: how should this new understanding be structured, what relationships does it create or invalidate, where does it fit in the existing graph. And a model participates in retrieval: not just fetching relevant nodes but reasoning about what the question actually requires and what the substrate can and can’t answer.

The models are not guests browsing a library. They are the librarian’s thought process. They are how the substrate reads, evaluates, synthesises, and forms views. The graph is the understanding. The models are how that understanding gets built, maintained, and queried. And because the thinking is embedded at every layer, the quality of the understanding compounds over time. The substrate gets better at assessing new information because it has more context for evaluating it. It gets better at organising because the graph is richer. It gets better at retrieval because there is more structure to reason about.

This has a further implication that is worth stating plainly. If the substrate can evaluate the quality of its own understanding, if it can identify where it is thin or contradictory or outdated, then it can direct its own attention. It can notice that a part of the system has changed significantly but its understanding of the downstream effects hasn’t been updated, and initiate that update without being asked. That is not a scheduled re-indexing job. That is a system that notices gaps in its own knowledge and works to close them. That is, in a meaningful sense, self-learning.

The next age of computing

It is worth stepping back from the specifics to see what all of this points toward, because it is larger than any single tool or framework.

The first age of computing was about information. We learned to capture it, store it, transform it, and share it. Spreadsheets, databases, email, the web. The revolution was that information, which had previously been trapped in filing cabinets and phone calls and human memory, could be digitalised, searched, and moved at the speed of light. This was genuinely transformative and we are still living in the infrastructure it built.

The second age was about process. Software like Salesforce didn’t just store information. It encoded the steps a company goes through to do its work. What happens when a lead comes in? What data do we capture at each stage? How do we track progress? How do we report on status? The revolution was that institutional process, which had previously lived in handbooks and habits and the heads of experienced employees, could be captured in software and made repeatable. This was the era of enterprise SaaS, and it produced some of the most valuable companies in the world.

The third age, the one we are entering now, is about thought. Not information. Not process. Judgment. Given X, what do we think about it? What does it tell us relative to everything else we know? How does it fit into our understanding of the world? When our understanding changes, what actions do we need to take, what humans need to be alerted, what decisions need to be revisited?

This is what the AI industry calls a world model, though the term is currently used mostly in the context of physical simulation and robotics. The concept is broader than that. A world model is a structured, dynamic representation of how some part of reality works, maintained by cognitive processes, capable of reasoning about consequences, and able to update itself when new information arrives. A child builds a world model of gravity before anyone explains Newton’s laws. An experienced engineer has a world model of their codebase. A good executive has a world model of their market. These are not static representations. They are living understandings, continuously refined through contact with reality.

What we are describing with the knowledge substrate is the infrastructure for building world models in software. Not world models of physics. World models of organisations, of codebases, of domains, of any complex system where understanding the relationships between things matters more than retrieving descriptions of them.

Libraries, on their own, with their organisation and structure, make us better. But only as good as the effort we are willing to put into them. Librarians, the gardeners and keepers of these world models, are the differentiator. Knowledge with inherent intelligence, rather than information with search. Without that distinction, we are building a slightly better Google.

If models give us the means of expressing near-infinite language, and language is the means of communicating information, and if we can understand the judgments and processes of reflection that people perform in an organisation, then we can encapsulate that within software. Not as static documentation. Not as better prompts. As living, thinking representations that grow more knowledgeable over time, that can reason about what they know and what they don’t, that can generate entirely new understanding from the connections between existing knowledge. The substrate stores understanding the way the neocortex stores experience: not as a recording but as a structure that, when activated, produces thought.

What this means

The essays in this series have been converging on the same point from different directions. The Interregnum asked what happens to people when the work changes. The Middle asked who controls the architecture. Minimum Viable Company asked what the irreducible core of an organisation is. The answer, in every case, was the same: the understanding. The structured, hard-won, judgment-informed understanding of the problem.

The knowledge substrate is not a product. It is a concept: the idea that understanding should live in a structure you own, maintained by cognitive processes you control, queryable by any model but dependent on none. It applies to code first because that is where the bleeding edge of AI capability is sharpest and the need is most immediate. But it applies to any domain where the relationships between things matter more than the things themselves.

Building these structures is the work that matters most right now. Not because the current tools are bad, but because they are so good at simulating understanding that the absence of real understanding becomes invisible. The organisation that plugs in a RAG pipeline and gets fluent answers to its questions has solved a convenience problem. The organisation that builds a world model of its own work has solved the problem that all the other essays in this series have been circling: how to remain in control of your own intelligence in an era when the tools are designed, by default, to hold it for you.

The question is not whether AI is useful. It is. The question is what you are building underneath it, and whether it thinks.

a theory of ai