Namsang LABS
Radar · #ai #knowledge-management #obsidian #llm #karpathy

Compilation, Not Retrieval — Karpathy's LLM Wiki

· Sangkyoon Nam

On April 3, 2026, an X post by Andrej Karpathy climbed to the top of Hacker News within days. Major tech outlets and communities poured out analyses, and “how should LLMs manage knowledge” became the hottest topic in the developer community during the first week of April. LLM token consumption is shifting from code to knowledge.

“Something I’m finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge.”

Andrej Karpathy, X

#raw → wiki → output: A 3-Layer Architecture

Karpathy’s system is split into three directories: raw, wiki, and output. Just as a compiler turns source code into a binary, an LLM turns raw material into structured knowledge.

#Raw sources — raw/

This is where unmodified original documents live. Web articles, PDFs, images, datasets, and papers are stored as-is. Once added, they are not edited and not deleted. They are the foundation of every compilation; lose the originals and regeneration becomes impossible.

#Compiled knowledge — wiki/

Structured knowledge (concept pages, entity profiles, summaries, cross-references) accumulates here. It is raw/ compiled, knowledge in executable shape.

Karpathy published the structure of wiki/ as a Gist. The core is the schema. A CLAUDE.md (or AGENTS.md) file instructs the LLM about the wiki’s structure and workflow. Writing this schema is essentially deciding “how should I structure my domain.”

At the heart of the wiki sit two files.

  • index.md: a catalog of every page. A routing table made of one-line summaries and metadata. The LLM’s entry point for navigating the entire wiki.
  • log.md: an append-only chronicle. A record of when things were ingested and which queries were run.

#Query results — output/

Ask the LLM a question and the answers, slides, and charts pile up in output/. Most are one-shot, but the useful ones are filed back into wiki/. Every query becomes a contribution to the wiki.

#Compilation, Not Retrieval

For the past two years RAG has been the only standard answer to “let an LLM work with my knowledge”. Drop the documents into a vector DB, and when a question comes in, retrieve similar chunks and stuff them into the LLM’s context. Karpathy made his counterintuitive claim because he thinks this standard answer has a fundamental limit.

RAG retrieves and assembles anew on every question. All a vector DB sees is semantic similarity. It does not know that chunk #4271 contradicts chunk #8903, or that both are special cases of chunk #112, or that newer research has already overturned #8903. The moment you slice into chunks, context is severed and the relationships between documents disappear.

Compilation is the approach of recording those relationships up front. Instead of searching when a question arrives, the LLM reads the existing wiki when a new source comes in and explicitly updates the relationships. What the vector DB “doesn’t know” is written down in the wiki: contradictions as contradictions, special cases as special cases, superseded research as superseded. Search results are reassembled every time, but compiled knowledge accumulates and evolves.

AspectRAGCompilation
When work happensQuery timeIngest time
Storage formatVector embeddings (chunks)Markdown (pages)
RelationshipsSemantic similarityExplicit cross-references
Contradiction trackingImpossibleExplicit
Cumulative effectNoneAccumulates and evolves
InfrastructureVector DB requiredFilesystem only

“I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents.”

The actual scale is not small either. Karpathy’s wiki holds about 100 articles and 400,000 words per topic. That fits comfortably inside the context windows of major models, and index.md alone is enough to navigate it. The threshold where you actually need a vector DB is much further away than people assume.

#Three Operations That Evolve the Wiki

If a wiki were just a set of documents created once and left alone, it would be a static archive. What makes Karpathy’s system interesting is that three operations repeat, and the wiki keeps evolving.

  • Ingest: Add a new source to raw/, and the LLM reads it, summarizes it, and updates the relevant pages in the existing wiki. A single source can ripple through 10–15 pages. Use Obsidian Web Clipper for the web and Docling or Marker to convert PDFs into markdown.
  • Query: Pose a complex question against the wiki. The LLM finds relevant pages from index.md, reads them, cross-references them, and synthesizes an answer. Results pile up in output/, and the useful ones flow back into wiki/.
  • Lint: The LLM runs a health check against the entire wiki, looking for contradictions between pages, missing cross-references, stale information, and candidates for new articles. Just as code has linters, knowledge has linters too.

Ingest adds knowledge, query validates it, lint corrects it. The more this cycle repeats, the more the wiki evolves from a static archive into a living system.

#Different Starting Points, Same Conclusion

Karpathy is not the only one moving in this direction.

Shopify CEO Tobi Lütke released qmd, a markdown search engine that runs locally. It combines BM25 full-text search, vector semantic search, and LLM reranking, and it runs entirely on the local machine. Integrated as an MCP server for Claude Code, it also doubles as a memory backend for AI agents.

qmd and Karpathy’s compilation look like opposites. One is a search engine, the other declared “compilation, not retrieval.” But the two approaches are solving the same problem, just at different scales. If the wiki fits in the context window, compilation is enough; once it overflows, you need search.

Tiago Forte, the author of Building a Second Brain, also pivoted to AI-first in 2026. The original BASB was about “personal knowledge management”; the AI version redefines it as “personal context management”: feeding the right information to AI at the right moment. Karpathy’s raw → wiki → output is exactly that structure.

#The Tech That Makes the Structure Work

Looking at raw → wiki → output as a structure, there is nothing particularly new. What is new is that LLM technology can now maintain it. Three prerequisites fell into place in 2026.

Context windows expanded, and a personal wiki at the scale of hundreds of thousands of words could fit in one shot. Markdown became the LLM’s native format, letting it read and write directly without conversion cost. Then agents like Claude Code and Codex started manipulating files, and the LLM could read, edit, and save on its own. Reading, reading and writing, reading and writing and maintaining. The three prerequisites stacked in that order.

The real inflection point is not the context window. Gemini 1.5 Pro already offered 2M tokens back in February 2024, but that alone did not start the shift. The decisive change was agents being able to manipulate files directly. Until Claude Code and Codex, LLMs were only reachable through copy-paste, and maintenance was always a human job. Once the filesystem API opened, the wiki finally moved into the LLM’s jurisdiction.

Flip the framing of “compilation, not retrieval” and compilation presumes that the LLM itself reads, writes, and maintains files. No vector DB, no proprietary API. That is why an engineer (Karpathy), an executive (Lütke), and a theorist (Forte) reached the same conclusion from their own contexts.

For an LLM, a filesystem is enough.

To try it yourself, you can start by copying Karpathy’s Gist into your own LLM agent.

#References

Share this post