I Accidentally Built Karpathy's LLM Wiki
I set out to track how AI companies price and accidentally rebuilt Karpathy's LLM wiki — a model-maintained, evidence-grounded knowledge base. Here's what converged.
AI Summary
An LLM wiki is a knowledge base an AI model builds and maintains itself. Instead of re-reading a pile of raw documents every time you ask a question, the model compiles what it learns into structured, cross-linked pages once — and keeps them current. Andrej Karpathy sketched this pattern in a widely-shared gist. I had already built one. I just hadn’t called it that. Updated June 2026.
What is Karpathy’s “LLM wiki”?
The gist makes one core move: stop retrieving raw documents at query time, and have the model maintain a persistent wiki instead. It rests on three layers — immutable raw sources the model never edits, a wiki of model-written pages (entities, concepts, comparisons), and a small schema document that fixes the conventions. A human curates the sources and asks good questions; the model does the summarizing, cross-referencing, and filing. A periodic “lint pass” hunts for contradictions, stale claims, and orphaned pages.
The argument underneath it is the part worth keeping. The hard part of a knowledge base was never the reading or the thinking. It’s the bookkeeping — keeping summaries current, updating cross-references, noting conflicts across dozens of pages. That’s the work that makes knowledge bases rot, and it’s exactly the work a model doesn’t get bored of.
How did I end up building the same thing?
I didn’t read a design doc and implement it. The domain forced the shape.
Tracking how AI companies price is a moving-target problem. Pricing pages change weekly. A flat database of scraped prices rots quietly: a number that was right in March is wrong in June, and nothing in the row tells you which. So three decisions made themselves.
First, capture evidence and never touch it — a screenshot and the literal prices on the page, frozen with a date. Second, have the model write a page per company from that evidence, in the plain language a buyer would use. Third, periodically read across every page to find the patterns no single company reveals — which billing units are winning, where the price ceilings sit, what just changed across the market.
That is an immutable source layer, a model-written page layer, and a synthesis layer. I was most of the way into the gist before I’d read it. You can see the output: the per-company pricing blueprints and the live AI token pricing tracker.
Where the two designs line up
| Karpathy's LLM wiki | My pricing system |
|---|---|
| Raw sources, never edited by the model | Captured screenshots + dated prices, never edited |
| One markdown page per entity, model-written | One living page per company, model-written |
| A content index + a chronological log | A generated catalog + an append-only change feed |
| A periodic lint pass for contradictions and stale claims | A consistency lint pass across the whole corpus |
| Single-writer: human curates, model writes everything | Same: humans pick companies and ask questions; the model writes |
Where I went further
Convergence is the easy half of the story. The more useful half is the three places a production system had to harden the idea.
1. Facts are walled off from analysis. The part of the system that records prices is not allowed to interpret them, and the part that interprets is not allowed to invent prices. Observation and opinion live in separate passes. When a price moves, the analysis gets re-derived without ever disturbing the recorded fact.
2. Prices are re-validated against an independent second source. Confirming that a number appeared on the page it was captured from proves nothing if that page was itself stale. A separate check re-fetches the vendor’s help center or docs and compares. It’s the difference between “I saw this” and “this is true.”
3. Synthesis reads facts first — never its own past conclusions. This is the failure mode a single lint pass is most exposed to. A model that reads its own prior synthesis tends to confirm it, and a knowledge base that does this slowly launders yesterday’s guess into today’s fact. So the cross-company pass starts from the evidence with its earlier conclusions out of view, forms its read fresh, and only then compares against what it said last time. The loop that would otherwise reinforce itself is cut. A lint pass and a generated catalog sit on top to catch claims that drift out of sync as the underlying facts move.
If you’re building one
A few lessons that transfer to any model-maintained knowledge base, pricing or not:
- The bottleneck is bookkeeping, not reading. That’s the work a model erases, and the reason the pattern holds together at all.
- Be evidence-first or it rots. Freeze what you observed, with a date, and never let interpretation edit it.
- Separate observation from opinion. They are different jobs with different failure modes; don’t let one pass do both.
- Validate against a second source. “I saw it” is not “it’s true.”
- Don’t let synthesis read its own past. That’s how a knowledge base hallucinates itself into confidence.
If you’re applying this to pricing specifically, two starting points: the guide on usage-based pricing models, and the pricing calculators that turn a company’s published prices into a cost you can actually forecast. For the tracking layer underneath, see metering usage events.
Bottom line
I didn’t set out to implement someone’s architecture. I set out to keep a hard thing accurate, and the constraints walked me to the same three layers — then made me add the guardrails that keep a model-maintained wiki from quietly lying. The convergence is the validation. The guardrails are the lesson.