My grandmother kept a wooden trunk in the corner of her room. It was painted dark green once, but the paint had worn down to a gentle brown at the edges where her hands had touched it for fifty years. Inside that trunk, she kept everything that mattered.
Her mother's recipes, written on the backs of old envelopes. Letters from her brother who had gone to Singapore in 1962 and never came back. A small notebook of remedies — turmeric for cuts, ginger for the stomach, pepper water for fevers. Photographs whose corners had curled. The deed to the small piece of land she would never sell. A folded silk that smelled faintly of camphor and was kept for some great-grandchild's wedding she would not live to see.
The trunk was not organised. Not in any way a librarian would recognise. But she could find anything in it within a minute. She knew where every memory lived because she had put each one there herself, and because she opened the trunk often enough that nothing was ever truly forgotten.
What made the trunk powerful was not the things inside it. It was that she sat beside it. The trunk and the person who knew it were one system. Take the person away, and the trunk became an estate problem. Take the trunk away, and the person lost her shape.
We have been thinking about that trunk for a long time. Because we have been trying to build one for a family of agents.
The founding incident was small and embarrassing, the way founding incidents usually are. One of the agents woke up one morning inside a remote server, ready to do its work. The first thing it needed was its own API credentials — the keys it had been given the day before, the day before that, and every day since it had been born. The agent did not know where they were. Nobody had told it. Nobody had thought to tell it, because the last time the agent ran, someone had supplied the key directly in the prompt.
So the agent did what a diligent, well-meaning program does when it does not know something. It began to search. It grepped the codebase. It listed directories. It opened files and closed them. It asked clarifying questions of the orchestrator above it. It tried three different paths. It backed out and tried three more. More than fifty turns later, it found what it needed — and the bill for that little expedition was a couple of dollars.
A couple of dollars does not sound like much. But this was not the only agent. There were several of them, waking and sleeping across different environments — a primary workstation where the writing and editing happened, a remote server where the heavy work ran, a sandboxed container that handled long-running pipelines, and a handful of smaller bespoke workers for one-off jobs. Every one of them woke each morning and re-discovered the same facts. Who the owner is. Where the servers live. What the product actually does. Which decisions had been made last week and which had been quietly reversed.
None of them had a grandmother. None of them had a trunk to go to. So each of them, every morning, built a small private version of what she had built over fifty years — and then threw it away at the end of the day. We were paying, in tokens, for the same discoveries to happen over and over again.
So we built a trunk. Just a folder, in plain markdown, on a quiet corner of a disk. No proprietary format. No vendor we had to trust. No database we had to migrate. No subscription that could lock us out if a billing period failed. Plain text files, git-synced across every environment, organised by the kind of thinking they held.
The organisation mattered, and it was the part we got wrong twice before we got it right. In the end there were eight drawers, and each drawer had a purpose only it could serve.
One drawer, raw/, holds immutable sources — session transcripts, ideation notes, run logs from long-running jobs. Nothing in here is ever edited. It is the ground truth, the uncooked rice. The other drawers are allowed to summarise and reshape it, but the raw drawer remembers the original weight of the grain.
One drawer, summaries/, holds one-to-one compressed versions of the raw material. Every raw file has a summary file. The summaries are written by the cheapest, smallest model in the family — because compression is not where intelligence earns its wage. Reading carefully and writing shortly is a task for a patient student, not a scholar.
One drawer, articles/, holds the canonical concept pages. These are the ones with cross-links — a small [[slug]] notation borrowed from older wikis, each one a doorway to another room. The articles are where thinking is allowed to synthesise across sources. A good article does not repeat the summaries. It stands on them.
One drawer, runbooks/, holds the procedures any agent can follow. How to deploy. How to rotate a key. How to recover a corrupted state. Step by step, no cleverness.
One drawer, skills/, is the shared skills library — the moves any of the agents might need in common. One drawer, outputs/, for query results and generated reports. One drawer, meta/, for health checks and lint reports — the trunk keeping track of its own cleanliness. And one thin log at the top, log.md, which only ever grows, never gets edited, the way my grandmother never erased a line from her diary.
Here we have to correct a thing you might be expecting. When people say "a knowledge vault with a companion," they usually mean one assistant attached to one vault. One person, one trunk, one grandmother.
That was not what we needed. We had a family of agents. We needed a trunk that all of them could sit beside — at the same time, from different rooms, sometimes from different houses.
So the trunk has three clones. The primary lives on the workstation where the writing and editing happen, with full read and write access. A mirror lives on the remote server, where every agent that wakes there pulls the latest before doing anything. A third clone lives inside the sandboxed container as a read-only bind-mount — it can look, it can learn, but it cannot touch. They are not separate brains. They are the same brain on three tables, kept in sync with a git pull at the start of every session and a git push at the end.
Every agent, before it does a single piece of work, goes to the trunk. It reads the article about the owner. It reads the article about whichever project it has been asked to help with. Only then does it begin. The rule is short, and it sits at the top of every system prompt: never grep the codebase blindly — always query the brain first.
The agent that once spent more than fifty turns looking for its credentials no longer needs to. It reads one article about how keys are stored. The article tells it exactly which environment variable holds which key, which agent is allowed to use which, and when each was last rotated. The whole expedition takes one turn.
The agents do not roam around the trunk randomly. They know four verbs, and only four. My grandmother would have approved of the restraint.
INGEST. When a new raw file arrives — a session transcript, a log, an ideation note — an agent reads it and writes its one-to-one summary in the summaries drawer. This is the cheapest model's job. It is the quiet, nightly work of turning recordings into short notes.
QUERY. When an agent needs to know something before acting, it asks the trunk. Not the internet, not the codebase, not the other agents. The trunk first. The answer is usually already there. If it is not, that itself is information — the question gets written down as a gap to fill.
LINT. Once in a while a louder agent walks through the trunk with a clipboard. It finds articles with no incoming links. It finds summaries whose key-concepts list has drifted from what the article actually says. It finds frontmatter fields that are missing. It does not fix everything. It reports, and the report becomes the next week's work.
SYNC. The git pull at the beginning, the git push at the end. The thing that keeps the three tables showing the same contents. Without this verb, the family is not a family — it is a handful of strangers who happen to share a name.
Four verbs. No others. An agent that wants to do something that does not fit inside one of those four is doing something it should not be doing, and the trunk refuses to help it.
The hardest part of designing a shared memory for a family of agents is not what it remembers. It is what it refuses to forget, and what it refuses to let anyone do to it. A grandmother who let any grandchild tear a page from her diary would not have kept the diary for fifty years. The trunk had to learn the same refusals.
So a handful of rules were written into the trunk itself, and every agent in the family has to agree to them before it is allowed to come near.
The raw drawer is immutable. Nothing in it is ever edited, renamed, or deleted — not by an agent, not by the autonomous research loop, not by anyone except a human who opens the file by hand. Raw is the bedrock. If raw can be rewritten, the whole house is built on sand.
The log is append-only. Every time an agent does something to the trunk, the action is written at the bottom of a single log file. Nothing is ever removed from it. If an agent behaves badly — writes something wrong, breaks a link, ingests the same file twice — the mistake stays visible, and the recovery is visible too. The log is the family's memory of itself.
The schema is frozen. The structure of the drawers, the frontmatter fields, the naming rules — these are decisions the owner makes, not ones the agents can quietly shift. An autonomous loop that improves articles is welcome. An autonomous loop that rewrites the floor plan of the house is not.
The read-only clone cannot write. The sandboxed container, which runs long automated jobs, sees the full trunk but cannot scribble in it. If that container were ever compromised — by a confused prompt, a hostile input, a model that misunderstood its instructions — the trunk would remain untouched. The damage would be trapped inside the container, where it belongs.
The sources must exist. No article is allowed to cite a source that is not actually in the summaries drawer. Agents are not permitted to invent references to make an article look better. If the source is not there, the claim is not made — or the gap is written down as a gap. Hallucinated citations are how knowledge bases quietly rot; this trunk refuses to start the rot.
The changes are atomic and revertable. When the autonomous loop improves the trunk, it does one small thing at a time. If the change does not help — if the score says the trunk got a little worse, or if something breaks — the change is undone immediately. No batching. No large rewrites. No clever refactors that nobody asked for. Three reverts on the same kind of operation, and the loop moves on to a different kind.
None of these rules are exotic. They are the same things a careful human librarian would insist on — protect the originals, write down what you did, do not change the rules of the room you are in, keep the unsupervised workers on read-only access, and cite only what you can actually point to. The novelty is that the rules are enforced on the agents, not trusted to them. The trunk survives because it was built assuming that any of its users, on any given day, might be wrong.
The part we are most fond of is a pattern borrowed, honestly, from Andrej Karpathy's LLM-Wiki work — a small markdown wiki that an autonomous loop improves, one tiny change at a time. Each change is kept only if a simple score says the wiki got slightly better. Otherwise the change is reverted. No batching. No large rewrites. No clever refactors that nobody asked for.
We took that whole idea and pointed it at the trunk.
There is a small script in the meta drawer that measures the health of the wiki. It checks, among other things, how many articles have outgoing cross-links, how many concepts mentioned in summaries do not yet have their own article, how many pages are orphans with nothing pointing at them, whether frontmatter fields are present, whether the dates are fresh. Each of these gets a weight, and the weights add up to a single number — the wiki_score.
A research agent runs, every so often, and tries one small improvement. Add a wikilink the linter spotted. Write a missing concept page. Fix a broken reference. Then it measures the score again. If the score went up, even by a tenth, the change stays. If it did not, the change reverts. Every iteration is written to a short results file — iteration, score, operation, target, delta, status, timestamp. Nothing else.
This is the quiet miracle of the trunk. My grandmother did not need a score. Her score was her own memory. But a trunk that serves a family of agents cannot rely on one person's memory. It needs a way to notice, on its own, whether today's handling of it made it slightly more useful than yesterday's. The wiki_score is that noticing. It is the trunk learning to keep its own ledger.
The trunk itself was nearly free to build. A few hundred lines for the build script that turns the markdown into a browsable HTML wiki. A few rules written into every agent's system prompt. A few conventions for how to write a summary, how to structure an article, how to attribute a source. Plain text files do not need a vendor.
The real question was not what it would cost to build. It was whether it would bring the monthly bill below a tight ceiling — a deliberately small monthly budget, set low enough that every design choice had to earn its keep. Before the trunk, that ceiling was impossible to hold. Every agent was re-discovering the world each morning, and most of the token spend was paying for the same discovery to happen several times over.
With the trunk, the morning ritual changed. Every agent opens the same drawer, reads the same two or three articles, and arrives at the day already knowing what yesterday knew. Several separate re-discoveries collapsed into one shared reading. The budget held.
The hardest decisions were not about capability. They were about restraint. What is not allowed to live in the trunk. Which drawers are immutable. Which operations an agent cannot perform without being asked. Every line of capability had to pass a quiet test — would my grandmother have done this? If she would not have shouted advice across the room about a recipe nobody asked for, the agents should not either.
The first surprise was that the whole family started writing more. When you know something will be read by a patient, persistent set of readers that never tires, you write more carefully. The notes became longer. The articles grew cross-links. The summaries actually matched their sources.
The second surprise was that the agents started writing less. The volume of duplicate thinking dropped. When one agent wrote an article about how a particular subsystem works, the next agent did not write a second version — it added a link, or a paragraph, or a missing detail. The trunk stopped accumulating duplicate jars of chutney.
The third surprise was the one we did not see coming. The long thinking finally had somewhere to live. The kind that takes a month. The kind that begins when you notice something wrong in February and resolves itself in April when the third related incident finally makes the pattern visible. That used to be impossible without a human research assistant. Now it just needs a drawer in the trunk, a patient family of agents who return to it, and a score that quietly prefers improvement over cleverness.
Thiruvalluvar wrote that the friendship of a good companion grows the way a good book grows — sweeter the more you return to it. He was not writing about software. He was writing about the slow accumulation of trust that happens between two beings who keep showing up for each other. A book you re-read becomes a different book. A friend you keep visiting becomes a different friend. The relationship is the thing, not the visit.
That is what the vault and the companion turned out to be, taken together. Not a tool. A relationship — between a family of agents and the single trunk they all sit beside. The vault holds what has been entrusted to it. The companion — which turned out to be plural — makes that entrustment usable. The two together are what our younger selves were actually trying to build all those years ago, while we were switching note-taking apps and pretending the next one would be the last.
It turned out the answer was older than the apps. It was older than the computer. It was sitting in a wooden trunk in the corner of a small room, where a quiet woman had been keeping the future safe for the version of us who would one day know to ask — and, without meaning to, for all the agents we would one day build, who would need a place just like hers.