A few weeks ago, my senior cybersecurity engineer told me he was leaving. Two-week notice. He joined us in September 2023, two and a half years deep in the Cato rollout, the OpenVPN tear-down, the CrowdStrike deployment, the audit response work, the daily incident triage. The kind of institutional knowledge that lives in one person’s head and a private Obsidian vault.
I had two options. Buy time with a contractor while we onboarded a replacement. Or try something different: capture what he knew before he walked out the door, in a form that the team and I could keep querying after he was gone.
I picked the second one, partly because I’d been reading Andrej Karpathy’s LLM-wiki gist and wanted to put it to a real test on a real problem. This is the writeup of how that went.
The result, called Bryanbot internally: 463 markdown files compiled into around 2,000 wiki pages, $94 in compute, about five hours of pair-programming with Claude Code. The team can ask anything from those notes through a CLI, a terminal UI, or a local web UI on our LAN, and get back synthesized answers with verifiable citations to the original words.
Why not just RAG?
The obvious first instinct is to chunk the notes, embed them, drop them in a vector DB, and stand up a chat UI. A hundred lines of langchain. A few dollars in embedding fees. Done in an afternoon.
I considered it and rejected it for three reasons.
RAG re-derives understanding on every query. Every time someone asks “what did we decide about Wiz?”, the system has to surface chunks and reconcile them on the fly. He mentioned Wiz across maybe 30 daily notes. That’s 30 chunks to reason over, every time, forever.
RAG doesn’t produce a curatable artifact. When his raw notes get archived, the bot dies with the index. There’s nothing a human can read or edit. The knowledge lives in floats.
RAG is tied to the embedding model. Switch from one provider to another and you start over. The “knowledge” isn’t portable.
For a transient archive of someone still on the team, RAG would have been fine. For preserving what a departing senior engineer knew, in a form that’s still useful in three years when everything in the LLM stack has changed, the wiki approach was the right answer. The synthesis happens once, at ingest time. The output is plain markdown. A future engineer can read it, edit it, fork it. It survives the LLM provider, the bot, and the original notes.
The pattern in one paragraph
Three layers. The raw layer is the immutable source corpus, files you don’t modify. The wiki layer is the LLM-generated markdown knowledge base: entity pages (people, tools, vendors), concept pages (frameworks, techniques, policies), source-summary pages, and a deterministic index. The model owns this layer entirely. The schema layer is a config document that tells the model the rules: page types, frontmatter format, citation conventions, security policy.
The LLM does three operations. Ingest reads raw and writes wiki. Query reads wiki and answers with citations back to raw. Lint checks the wiki for broken links, contradictions, and drift. Humans direct the analysis and curate the sources. The LLM does the bookkeeping.
The pitch: maintenance work is what kills personal knowledge bases. LLMs don’t get bored, don’t forget to update cross-references, and can touch fifteen pages in one pass. The Memex idea, with the maintenance problem actually solved.
What I built
Five milestones over a few evenings, in order: get the inputs flowing, get the artifact compiled, build the query surface, harden it.
Ingest is where the design choices matter. The per-file pipeline goes: parse, decode, strip Obsidian-specific syntax the model can’t run, send to Haiku 4.5 for extraction with a Zod-validated JSON schema, route dense files to Opus 4.7 for the compile pass, write atomically. Each entity page contains “merge fences” — explicit blocks marking which source contributed which content. When a new daily note mentions Wiz, the compile model receives the existing Wiz page as context, replaces only the section within the fence for that source, and leaves everything else alone. Three properties fall out of this: cross-source synthesis on a single page, hand-editability outside the fences, and idempotent re-ingest.
The query engine is a callable library, not a CLI feature. Everything that does retrieval and synthesis lives in one async function. That meant the terminal UI, the one-shot CLI, and the local web server could all consume the same engine. Adding a new interface is a transport change, not an engine rewrite.
The non-negotiable feature was citation verification. After synthesis, the code grep-matches every quoted snippet against the raw file. If the match fails, the snippet gets rewritten as [paraphrase] and the citation marked unverified. This is the difference between “the model paraphrased and called it a quote” and “this exact text is in his notes.” Without that gate, the bot’s outputs would be plausible but unprovable, which is worse than useless for security work.
Lint runs locally, no LLM. Six checks: broken wikilinks, broken raw citations, exact-duplicate slugs, near-duplicate slugs, orphans, low-confidence singletons. The biggest win was --fix for exact-duplicate slugs. The ingest model has a habit of classifying the same entity multiple ways: the first time NinjaOne shows up as “the RMM tool” it becomes a tool page; the next day’s “talked to NinjaOne about pricing” makes a vendor page. Same slug. After the full ingest, lint --fix merged 84 such groups in one pass. Pick a canonical, union the merge fences, delete the duplicates.
The web UI was the piece that mattered most for adoption. His teammates aren’t going to install Node to use a terminal app. The first version was a local HTTP server, around 250 lines, vanilla HTML and CSS, no build step, server-sent events for streaming, binding to localhost with an opt-in flag for LAN access. The server walks the wiki, builds a slug-to-path map, and rewrites Obsidian wikilinks into clickable URLs before sending the answer to the client. Click a wikilink in an answer and you’re reading the entity page rendered as HTML. Click back and your conversation is still there.
That POC has since moved to Azure App Service behind full SSO, with the wiki on a mounted Azure Files share. Same trust boundary as the rest of our M365 environment, no longer tied to a developer laptop, and AAD identity scopes who can ask what.

A note on voice
The bot doesn’t roleplay as the person whose notes it was built from. Every answer is in a neutral synthesizer voice: “the notes indicate”, “per the 2025-Q3 phishing writeup”. There’s an opt-in mode that prefixes a question with “what would he say:” and produces a Bryan-voice answer, and even there the response is clearly labeled as inference and grounded in cited evidence.
Why this matters: he is a real person who is no longer at the company. Putting words in his mouth via the model is a real risk, especially as the technology gets better at sounding human. The neutral voice plus the citation requirement is what makes the bot defensible. Anyone reading an answer should be able to distinguish “he wrote this” from “the model synthesized this from his notes.”
It also matters for trust over time. As models change, as the wiki gets edited, as new sources get added, the voice stays the same. That’s the property that lets the bot still be useful in three years when nobody on the current team remembers him personally.
When the wiki, not RAG, is right
Some takeaways from this project that I think generalize.
The wiki pattern is right when the source corpus is bounded and won’t grow infinitely, when the audience needs answers and a curatable artifact, when you expect the source to outlive specific LLM providers, when synthesis across many sources matters more than lookup, and when cross-source contradictions need to be flagged.
RAG is right when the corpus is large and growing, when lookup matters more than synthesis, when answers are commodity (“what’s our PTO policy?”), when you don’t need a human-readable artifact, and when per-query cost is the dominant constraint, not setup cost.
For this case specifically, the wiki was clearly right. He’s leaving permanently. His two and a half years of context is a one-time-capture problem, not a continuously-growing corpus. His successor needs to read what he knew, not just query around it. The compiled artifact will outlive any specific LLM provider’s pricing or product.
What’s next
A few directions worth thinking about. A Microsoft Teams bot wrapping the same query function, deployable to Azure with the wiki on a mounted file share, so the bot is no longer tied to my laptop. Per-user budget caps if I open the web UI to teammates over Tailscale. A daemon mode that keeps the prompt cache warm across sessions to bring first-query latency from 30 seconds to well under 10. Sonnet by default with an opt-in deep mode for the questions that actually need Opus.
On working this way
The intellectual heavy lifting was Karpathy’s. The pattern as he described it — three layers, LLM owns the wiki, humans curate sources, queries read the compiled artifact — is the load-bearing idea. Without that framing I’d have built a worse RAG and called it a day.
The implementation heavy lifting was Claude Code’s. I drove the architecture, made the decisions, debugged the surprises, chose the tradeoffs, but the actual TypeScript was written in a tight pair-programming loop with Claude Code (Opus 4.7) inside the terminal. A five-hour solo session by me alone wouldn’t have produced 6,000 working lines across sync, ingest, query, lint, two UIs, OAuth, and publish. A five-hour session with an LLM co-developer that I trust to read my plan, ship a draft, and iterate on feedback did. Worth being honest about, because the build cost — $94 in API compute, five hours of attention — only makes sense in light of how the work was actually done.
If you’re considering doing something similar, capturing what someone knew before they walked away, and you want to talk through the tradeoffs, my door’s open.