There is a pattern quietly spreading across AI teams, research labs, and solo developers that does not have one canonical name or one official inventor. Some call it a "living knowledge base." Others call it a "compiled wiki." Andrej Karpathy called his version an LLM Wiki in a widely shared GitHub Gist, and the reason it resonated with many practitioners is not because the idea was new, but because it named something they had already been bumping into independently.
The core problem is simple: AI agents are stateless by default, but knowledge work is not. Every time a session ends, the synthesis disappears. The documents remain, but the understanding built from them does not. This article explains the general architectural pattern — what it is, why it works, and where it genuinely complements the alternatives.
The Problem With Treating AI Like a Search Engine
When teams first need to give an AI agent access to their documents, the natural instinct is Retrieval-Augmented Generation. RAG chunks documents into fragments, embeds them as vectors, and retrieves the most similar ones at query time. It is well-understood, well-tooled, and scales to millions of documents.
But RAG is primarily a search system, not a knowledge system. It finds relevant fragments — it does not synthesize or accumulate understanding across them. The model re-reads, re-reasons, and re-derives answers from scratch on every single query. Connections that span multiple documents can be missed. Conflicting claims get imported together without resolution. And the synthesis produced at query time is immediately discarded.
For point-in-time retrieval across large corpora, RAG is often the better choice. It stays strong when you need broad coverage, frequent updates, and a lighter setup. For building an agent that develops more coherent understanding of a curated, evolving domain over time, a different architectural pattern is worth considering.
Figure 1. Standard RAG Pipeline. Documents are chunked and embedded at index time. At query time, relevant chunks are retrieved and passed to the LLM. No synthesis is retained between sessions.
What the LLM Wiki Pattern Actually Is
The pattern inverts when the heavy work happens. Instead of synthesizing at query time, the LLM synthesizes at ingestion time — reading source documents once, extracting structure, resolving conflicts, and writing the result into a persistent, interlinked set of Markdown files. Queries then run against that pre-compiled understanding rather than raw fragments.
Think of it the way a compiler works. You do not re-parse source code every time you run a program. You compile it once into an optimized binary. RAG retrieves and re-synthesizes at query time. The LLM Wiki compiles once and can grow more useful with each new source.
The resulting knowledge base is not a database or an index — it is a living document the agent actively maintains. New sources get ingested and integrated. Contradictions get flagged. Cross-links get updated. The structure grows more accurate and more interconnected over time.
This is not one specific tool or implementation. It is a pattern that has emerged independently in multiple write-ups and implementations around AI-maintained knowledge bases.
The Architecture in Three Layers
Most implementations converge on the same three-layer structure regardless of tooling:
Immutable sources (raw/) — original documents the LLM reads but never modifies. PDFs, papers, web clips, meeting notes. The ground truth layer. Because nothing here ever changes, you can always re-derive the wiki from scratch if needed.
LLM-owned synthesis (wiki/) — Markdown files the agent writes and maintains. Concept pages, entity pages, source summaries, comparisons, a navigable index, and a log of every operation. This is the compiled output.
Schema and rules (CLAUDE.md, AGENTS.md, or equivalent) — a configuration file that defines page structure, taxonomy, formatting conventions, and how the agent should handle contradictions. Without this layer, different ingestion runs produce inconsistent output. With it, the wiki stays coherent as it grows.
Four Reasons This Pattern Works
1. Knowledge compounds instead of resetting. Insights generated during ingestion and later queries do not disappear at session end; they become part of the wiki. The next answer starts from a better baseline than the last one.
2. Provenance and contradiction handling make it trustworthy. Every claim can point back to a source, and new sources can challenge old ones instead of quietly coexisting. That makes review and correction far easier.
3. Markdown keeps it portable and maintainable. Plain text, Git history, frontmatter, and schema rules keep the system auditable and easy to lint. The knowledge base stays human-readable and can be rebuilt from the raw sources if needed.
4. It can connect memory to action. MCP lets the wiki talk to real tools and workflows, while research on persistent memory systems like MemGPT and Zep supports the broader principle that retaining context across sessions outperforms one-shot recall — even though those systems use architectures distinct from the wiki pattern. The result is not just a library of notes; it is an operating layer for the agent.
Figure 2. LLM Memory Hierarchy. Context window acts as RAM; external storage acts as hard disk. The agent autonomously manages what lives in active memory.
A Concrete Example: Professional Services Onboarding
Consider a mid-sized consulting firm whose analysts spend two to three hours per engagement reading prior project reports, client briefings, and methodology documents before they can contribute meaningfully. The knowledge exists — it is just fragmented across hundreds of PDFs, stored in folders no one systematically maintains.
With an LLM Wiki, those source documents are ingested once. The agent compiles a structured knowledge base: client-specific entity pages, cross-referenced methodology summaries, and a conflict log where older recommendations contradict newer ones. When a new analyst joins an engagement, they query the wiki rather than the raw folder. When a new report is filed, the agent integrates it — updating existing pages, flagging any contradictions with prior work, and strengthening cross-links.
That means a new analyst can get to useful context in minutes instead of rebuilding the briefing trail from scratch.
The wiki fits comfortably within the pattern's sweet spot: a few hundred documents per client matter, a defined domain, and knowledge that evolves gradually over months rather than exploding daily. The setup cost is paid once; the compounding benefit accrues on every subsequent query.
Honest Tradeoffs: Where RAG Still Wins
This pattern is best suited to curated, evolving knowledge domains with hundreds to a few thousand documents — not large-scale enterprise search. RAG is the better option when the corpus is much larger, changes frequently, or you want minimal maintenance overhead. The tradeoffs are real.
| Standard RAG | LLM Wiki Pattern | |
|---|---|---|
| Best scale | Millions of documents | Hundreds to low thousands |
| Computation cost | At query time | At ingestion time |
| Knowledge state | Rediscovered each query | Compounds over time |
| Setup complexity | Lower — index and query | Higher — schema, ingestion rules, maintenance |
| Transparency | Embeddings are opaque | Fully human-readable |
| Privacy | Depends on deployment | Depends on deployment |
Table 1. Where RAG and the LLM Wiki Pattern Fit Best.
If your problem is searching a massive enterprise corpus for point-in-time facts, RAG is a strong option. If your problem is building an agent that develops persistent domain context, resolves contradictions, and gets smarter over time — the wiki pattern is worth the additional setup cost.
The Bottom Line
The LLM Wiki is not one product or one person's invention — it is a general answer to a real problem. AI agents need a way to build understanding that persists across sessions, not just retrieve fragments that evaporate. The pattern works because it compounds knowledge, keeps source material immutable for traceability, makes maintenance tractable, and lets the LLM maintain the knowledge base rather than relying on human discipline that fails at scale.
The knowledge your agents work with should get better every time they use it. That is the promise — and the reason this pattern keeps being rediscovered independently by teams building AI systems that actually need to learn.
References
- Boudro, D. How to Build Karpathy's LLM Wiki: The Complete Guide to AI-Maintained Knowledge Bases. Starmorph Blog, April 9, 2026. https://blog.starmorph.com/blog/karpathy-llm-wiki-knowledge-base-guide
- Data Science Dojo Staff. The LLM Wiki Pattern by Andrej Karpathy: A Step-by-Step Tutorial to Building a Compounding Knowledge Base. Data Science Dojo, April 16, 2026. https://datasciencedojo.com/blog/llm-wiki-tutorial/
- Karpathy, A. LLM Wiki. GitHub Gist, April 2026. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
- Model Context Protocol. What is the Model Context Protocol (MCP)? https://modelcontextprotocol.io/docs/getting-started/intro
- Packer, C., Wooders, S., Lin, K., Fang, V., Patil, S. G., Stoica, I., & Gonzalez, J. E. MemGPT: Towards LLMs as Operating Systems. arXiv, 2024. https://arxiv.org/abs/2310.08560
- Rasmussen, P., Paliychuk, P., Beauvais, T., Ryan, J., & Chalef, D. Zep: A Temporal Knowledge Graph Architecture for Agent Memory. arXiv, 2025. https://arxiv.org/abs/2501.13956
- Xu, X., Weytjens, H., Zhang, D., Lu, Q., Weber, I., & Zhu, L. RAGOps: Operating and Managing Retrieval-Augmented Generation Pipelines. arXiv, 2025. https://arxiv.org/abs/2506.03401