Copy-on-write vector branching for embedded multi-agent memory. Branch a base memory in ~0.5 ms / 162 bytes — independent of base size. 83× faster, 3000× smaller than full-copy snapshots.
Every other vector store makes you full-copy the index to snapshot or fork it. agenticow branches it.
A branch records only its own edits plus a pointer to the parent. Creating one is constant-time and constant-size — 162 bytes — whether the base holds 10k or 1M vectors.
Query a branch and you see parent ∪ your edits, with the child winning on any id collision and deletes honored. The base data is never duplicated.
Checkpoint in 162 bytes, then throw away a poisoned branch and resume from the last clean state in under a millisecond. No re-indexing.
All three are O(1) in base size today.
Spawn N agents that each branch a shared base memory. N branches cost N×162 B and N×0.5 ms instead of N full copies of a 496 MB index.
An agent ingests bad/adversarial memories into its branch. Drop the branch — the base and every sibling are untouched. Isolation is verified.
Snapshot agent memory before every risky step. Each checkpoint is 162 B + the edits since the last one — keep thousands of them.
Smarter orchestration, not smarter execution. Each pattern cites a head-to-head scaffolding ablation on cheap models (FRAMES, deepseek-v4-pro + glm-5.2, n=50, strict EM, reasoning OFF) — SCAFFOLDING-ABLATION.md.
Deep self-refine backfires on cheap models: Plan-and-Solve −10pp, Reflexion −8pp at 2.85× cost, and ReAct saturates ~8–12 steps. So run 2–3-turn tasks; on failure drop the branch (~0.5 ms) and respawn — don't force self-correction.
Cheap models win on the first shot; the only gain (Self-Consistency +4–6pp) saturates by N≈7. The product win is multi-tenant scale: 1,000 isolated branches at 943× less disk. Many shallow agents, not one deep one.
A cheap LM-as-judge picks worse than majority vote (−4 to −6pp). Gate promote with tests / compilers / schemas / a human — never the cheap model. Nuance: on code, tests are a zero-cost oracle, so test-gated promotion is strong (the bridge to @metaharness/jujutsu).
agenticow is Git for agent memory, not a cognitive enhancer. Like Git, it lets thousands work concurrently, isolate, roll back, merge through CI — it makes cheap-model fleets governed, isolated, ~free. It does not make them smarter (RAG null + scaffolds backfired).
Four end-to-end use cases, one paradigm: Branch → Mutate → external-Verify → Promote / Discard. Selection is always a deterministic external verifier (test / regex / checker / distance) — never a cheap LM-as-judge (the ablation showed that's a negative selector). These show the branching mechanics, not model intelligence. Numbers below are real measured output (AMD Ryzen 9 9950X, Node v22). Run all: npm run examples:production.
import { openBase } from 'agenticow';
const base = openBase('kb.rvf', { dimension: 1536 });
const sandbox = base.fork('untrusted-doc'); // Branch
sandbox.ingest(docEmbedding, { text: 'untrusted doc' }); // Mutate
const hits = sandbox.query(injectionSignature, { topK: 3 });
const exploit = hits.some(h => h.distance < 0.02); // external-Verify (deterministic)
if (exploit) sandbox.rollback(); // Discard — base never poisoned (blast radius 0)
else sandbox.promote(); // Promote — vetted delta merged into base
Untrusted-doc ingestion behind a COW fork. A deterministic Security-Prober (injection-signature distance probe) gates it: exploit → rollback() in 1.1 ms, 0 vectors reached base; clean → promote().
5 persona branches off one base. An external judge — policy-constraint gate then distance-to-rubric score, not a cheap LM — picks the winner: 4/5 qualified, winner promoted, losers discarded free.
24-step migration, checkpoint every 5. A latent bug at step 12 trips a compiler-style check at step 24 → rewind to the step-10 checkpoint in 1.1 ms, 0 steps replayed, 24/24 reachable.
One mmapped base, 1,000 isolated tenant branches. An isolation oracle proves no cross-tenant leak: 0/200 probes leaked, 2.4 KB/tenant, 530× less disk than full copies.
Reproduced on an AMD Ryzen 9 9950X (32 threads), Node v22, dim 128, cosine, median of 11 runs. Run it yourself: npx agenticow bench
| Base N | Base file | Branch create (p50) | Empty branch | 100-edit branch | Full copy (p50) | Speedup | Smaller |
|---|---|---|---|---|---|---|---|
| 10,000 | 5.0 MB | 519 µs | 162 B | 51.4 KB | 373 µs | 1× | 32,102× |
| 100,000 | 49.6 MB | 463 µs | 162 B | 51.4 KB | 5.83 ms | 13× | 321,037× |
| 1,000,000 | 496.3 MB | 472 µs | 162 B | 51.4 KB | 67.14 ms | 142× | 3,212,443× |
Branch delta is a pure function of edit count (~520 B/edited vector) with zero dependence on base size. At small bases a raw copyFile is already sub-millisecond, so the COW win shows up — and widens — at scale.
Scenario: 1,000 branches over a 1M-vector base (dim 128, ~496 MB base). green = measured on agenticow (AMD Ryzen 9 9950X); other approaches are amber = published / estimated with sources cited — not fabricated.
| Approach | Branch / snapshot create | Per-branch storage | Query latency (ANN) | Cost @ 1,000 branches | Native COW branch / rollback |
|---|---|---|---|---|---|
| agenticow (COW) | 0.47 ms / 162 B measured, flat to 1M | ~10.8 KB measured | ~6.3× behind hnswlib @ 1M measured* | ~507 MB local · ~$0 infra (embedded) measured† | ✓ instant (p50 0.57 ms) |
| Naive full-copy | 67 ms / 496 MB measured @ 1M | full base (~496 MB) | = source engine | ~484 GB local measured × N | ✗ (copy, not COW) |
| Pinecone (serverless) | no native branch — full re-upsert/copy | full copy (managed) | fast (core strength) | ~484 GB stored ≈ $160/mo storage + read/write units est.¹ | ✗ |
| Milvus | collection snapshot = full copy / reindex | full copy | fast (core strength) | ~484 GB resident → large RAM cluster, $$$/mo infra est.² | ✗ |
| Qdrant | snapshot = full copy | full copy | fast (core strength) | ~484 GB → managed/self-host, $$$/mo est.³ | ✗ |
| pgvector | SQL dump / table copy + reindex | full copy | moderate | ~484 GB in Postgres, reindex per copy est. | ✗ |
| Chroma | full copy of the collection | full copy | moderate | ~484 GB local/managed est. | ✗ |
| lakeFS / DVC | fast metadata branch (their strength) | file-level delta (cheap) | n/a — not a vector engine | cheap branching, but you still build/serve the ANN index yourself published | ✓ for data/files · ✗ for the vector index |
Takeaway: agenticow wins decisively on branch-create speed, per-branch storage, and multi-branch cost — and is the only option with native COW branching + instant rollback of a live vector memory. It concedes raw ANN search speed to the dedicated vector DBs (Pinecone / Qdrant / Milvus); use those when single-index query throughput is the priority, and agenticow when you need cheap branching, checkpointing, and rollback of agent memory.
* SIFT-1M same-machine (matched recall@10 ≈ 0.97): ruvector HNSW is ~6.3× behind a dedicated flat-index engine like hnswlib-node at 1M-vector scale (~2.7× on small in-cache sets; the earlier ~2.7× was a 100K-vector synthetic set that fit in L3 cache). Deliberate trade — agenticow competes on memory versioning/isolation/rollback, not raw search speed; narrowing levers (graph-quality shrink-heuristic + stack-local heaps) are on the ruvector-engine roadmap. † agenticow base (~496 MB) + 1,000 × ~10.8 KB ≈ 507 MB; runs in-process, no managed infra. ¹ Pinecone serverless storage est. from pinecone.io/pricing (~$0.33/GB-month) — storage only, excludes read/write units. ² Milvus / Zilliz Cloud est. from zilliz.com/pricing (RAM-resident, cluster-sized). ³ Qdrant est. from qdrant.tech/pricing. All competitor figures are published/estimated and labeled as such; only agenticow's are measured here.
agenticow is the only one with native copy-on-write branching of the vector index. We concede raw ANN throughput honestly — pick the right tool for the job.
| Capability | agenticow | Pinecone | Milvus | pgvector | Chroma | Qdrant |
|---|---|---|---|---|---|---|
| Native COW branch of the index | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| O(1)-in-base branch create | ✓ 162 B | ✗ | ✗ | ✗ | ✗ | ✗ |
| Snapshot mechanism | COW delta | full copy | full copy | SQL dump | full copy | full copy |
| Exact read-through (parent ∪ edits) | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Embedded / in-process (no server) | ✓ | ✗ | ✗ | via PG | ✓ | ✓/server |
| Raw ANN throughput | ~6.3× behind hnswlib @ 1M* | high | high | moderate | moderate | high |
| ANN search spanning the branch | ✓ shipped (recall@10 ≈ 1.0, linux-x64*) | n/a | n/a | n/a | n/a | n/a |
*On a measured SIFT-1M benchmark (same machine, matched recall@10 ≈ 0.97), the underlying ruvector HNSW is ~6.3× behind a dedicated flat-index engine like hnswlib at 1M-vector scale (~2.7× on small in-cache sets — the earlier figure was a 100K-vector synthetic set that fit in L3 cache; the gap widens at 1M). This is a deliberate trade: agenticow does not compete on raw single-index search throughput — its unique capability is memory versioning, isolation, and lifecycle governance for multi-tenant agent fleets (1,000 parallel isolated reversible branches at ~0.5 ms/fork, which no flat ANN engine offers). Future levers to narrow the gap (graph-quality shrink-heuristic + stack-local heaps) are on the ruvector-engine roadmap, not agenticow's pitch. If you need maximum raw ANN speed on a static index, use a dedicated ANN library. Native ANN-across-branch (fork({nativeAnn:true})) ships for linux-x64-gnu today; other platforms degrade gracefully to exact read-through.
One small API. ESM, Node ≥ 18.
import { open } from 'agenticow'; // open or create a base memory const base = open('memory.rvf', { dimension: 1536 }); base.ingest([{ id: 1, vector: embedding }, /* ... */]); // branch it for a parallel agent — ~0.5 ms / 162 B, any base size const agent = base.branch('agent-a'); agent.ingest([{ id: 9001, vector: newMemory }]); // isolated // exact read-through: sees the base + its own edits, child wins on collision const hits = agent.query(queryVector, 10); // NEW 0.2.0 — native ANN ACROSS the branch (single Rust dual-graph query) const fast = base.fork('agent-b', null, { nativeAnn: true }); fast.query(queryVector, 10); // parent ∪ edits via native HNSW, recall@10 ≈ 1.0 // checkpoint + rollback a poisoned branch const ckpt = agent.checkpoint('clean'); agent.ingest([{ id: 666, vector: poison }]); agent.rollback(ckpt.id); // poison gone, clean memory intact
fork(label, file, {nativeAnn:true}) runs a single Rust dual-graph HNSW query over
parent ∪ child (RuVector PR #617/#618),
verified recall@10 ≈ 1.0 (0.999) at 5,000 base ∪ 200 edits, dim 128, default cosine.
Platform: the native binary ships for linux-x64-gnu today; darwin / win / linux-arm64
are pending a CI cross-compile and degrade gracefully to exact read-through (identical
correctness). We still concede raw single-index ANN throughput to dedicated vector DBs (~6.3× behind a dedicated flat-index engine like hnswlib at 1M-vector scale, matched recall@10 ≈ 0.97; ~2.7× on small in-cache sets) — a deliberate trade, since agenticow competes on memory versioning/isolation/rollback, not raw search speed.
One per use case, in examples/ — node examples/X.mjs or npm run examples. Deterministic output (seeded RNG).
One base, a cheap COW branch per user — isolation + delta-only storage.
Poison a branch, discard it → base instantly clean (before/after query).
Checkpoint every 10 steps, crash at 31, resume from 30 — no replay.
branch → ingest → diff → promote the agent→test→prod memory pipeline.
N variant branches, score each, promote the winner (A/B / Darwin-style).
Fan out N agent branches off one base, query each, roll one back.
Every tier is backed by code you can run. PROVEN = bench + acceptance · DEMONSTRATED = executed + benchmarked · PoC = mechanics shown, cognition out of scope.
| Tier | What it shows | Status | Examples / bench |
|---|---|---|---|
| Practical | branch / checkpoint / rollback + exact read-through | ✅ PROVEN | 1,000-branch acceptance, recall@10 = 100% |
| Platform | promotion pipeline · compliance & right-to-erasure · A/B at scale | ✅ DEMONSTRATED + benchmarked | fork 464 µs · score 133 µs · promote 897 µs · contradiction-check ~1M pairs/s · 0.84 KB/branch |
| Exotic | parallel selves · Darwin-on-memory · simulated org (contradiction scan) | ⚗️ PoC (mechanics only) | cognition out of scope — judge/fitness is a scoring function, not an LLM |
Run: npm run examples · npm run examples:platform · npm run examples:exotic · npm run bench:ladder. Numbers measured on AMD Ryzen 9 9950X, base=5,000, dim=64.