npm i agenticow · MIT · $0

agenticow
Git for Agent Memory

Copy-on-write vector branching for embedded multi-agent memory. Branch a base memory in ~0.5 ms / 162 bytes — independent of base size. 83× faster, 3000× smaller than full-copy snapshots.

npm install agenticow
~0.5 ms
branch create (any base size)
162 B
empty branch on disk
142×
faster than full copy @ 1M*
O(1)
in base size

What is copy-on-write for vectors?

Every other vector store makes you full-copy the index to snapshot or fork it. agenticow branches it.

🌿

Branch, don't copy

A branch records only its own edits plus a pointer to the parent. Creating one is constant-time and constant-size — 162 bytes — whether the base holds 10k or 1M vectors.

🔎

Exact read-through

Query a branch and you see parent ∪ your edits, with the child winning on any id collision and deletes honored. The base data is never duplicated.

Instant rollback

Checkpoint in 162 bytes, then throw away a poisoned branch and resume from the last clean state in under a millisecond. No re-indexing.

Three things it makes cheap

All three are O(1) in base size today.

👥

Parallel agents, one base memory

Spawn N agents that each branch a shared base memory. N branches cost N×162 B and N×0.5 ms instead of N full copies of a 496 MB index.

🧪

Roll back a poisoned branch

An agent ingests bad/adversarial memories into its branch. Drop the branch — the base and every sibling are untouched. Isolation is verified.

📌

Zero-cost checkpointing

Snapshot agent memory before every risky step. Each checkpoint is 162 B + the edits since the last one — keep thousands of them.

Deployment patterns — what the data says

Smarter orchestration, not smarter execution. Each pattern cites a head-to-head scaffolding ablation on cheap models (FRAMES, deepseek-v4-pro + glm-5.2, n=50, strict EM, reasoning OFF) — SCAFFOLDING-ABLATION.md.

Fail-fast, shallow branches

Deep self-refine backfires on cheap models: Plan-and-Solve −10pp, Reflexion −8pp at 2.85× cost, and ReAct saturates ~8–12 steps. So run 2–3-turn tasks; on failure drop the branch (~0.5 ms) and respawn — don't force self-correction.

↔️

Scale horizontally

Cheap models win on the first shot; the only gain (Self-Consistency +4–6pp) saturates by N≈7. The product win is multi-tenant scale: 1,000 isolated branches at 943× less disk. Many shallow agents, not one deep one.

🔒

External, deterministic gates

A cheap LM-as-judge picks worse than majority vote (−4 to −6pp). Gate promote with tests / compilers / schemas / a human — never the cheap model. Nuance: on code, tests are a zero-cost oracle, so test-gated promotion is strong (the bridge to @metaharness/jujutsu).

🧩

Infra layer, not a brain

agenticow is Git for agent memory, not a cognitive enhancer. Like Git, it lets thousands work concurrently, isolate, roll back, merge through CI — it makes cheap-model fleets governed, isolated, ~free. It does not make them smarter (RAG null + scaffolds backfired).

Honest framing. The data says nothing at the orchestration layer reliably makes a cheap model smarter — RAG was null and every reasoning scaffold backfired or failed to pay for itself. agenticow's claim is leverage, not intelligence: infrastructure that turns "run 1,000 cheap agents safely" into a near-free, auditable operation.

Flagship production patterns — runnable + executed

Four end-to-end use cases, one paradigm: Branch → Mutate → external-Verify → Promote / Discard. Selection is always a deterministic external verifier (test / regex / checker / distance) — never a cheap LM-as-judge (the ablation showed that's a negative selector). These show the branching mechanics, not model intelligence. Numbers below are real measured output (AMD Ryzen 9 9950X, Node v22). Run all: npm run examples:production.

import { openBase } from 'agenticow';

const base = openBase('kb.rvf', { dimension: 1536 });
const sandbox = base.fork('untrusted-doc');           // Branch
sandbox.ingest(docEmbedding, { text: 'untrusted doc' }); // Mutate
const hits = sandbox.query(injectionSignature, { topK: 3 });
const exploit = hits.some(h => h.distance < 0.02);   // external-Verify (deterministic)
if (exploit) sandbox.rollback();  // Discard — base never poisoned (blast radius 0)
else         sandbox.promote();   // Promote — vetted delta merged into base
🛡️

red-team-sandbox

Untrusted-doc ingestion behind a COW fork. A deterministic Security-Prober (injection-signature distance probe) gates it: exploit → rollback() in 1.1 ms, 0 vectors reached base; clean → promote().

🗳️

multi-persona-consensus

5 persona branches off one base. An external judge — policy-constraint gate then distance-to-rubric score, not a cheap LM — picks the winner: 4/5 qualified, winner promoted, losers discarded free.

time-travel-debug

24-step migration, checkpoint every 5. A latent bug at step 12 trips a compiler-style check at step 24 → rewind to the step-10 checkpoint in 1.1 ms, 0 steps replayed, 24/24 reachable.

🏢

multi-tenant-saas

One mmapped base, 1,000 isolated tenant branches. An isolation oracle proves no cross-tenant leak: 0/200 probes leaked, 2.4 KB/tenant, 530× less disk than full copies.

Honest scope. External verifiers only (stated). These demonstrate the Branch→Verify→Promote mechanics — production-ready patterns — not model intelligence. Full outputs in examples/README.md.

Benchmarks — branch create vs full copy

Reproduced on an AMD Ryzen 9 9950X (32 threads), Node v22, dim 128, cosine, median of 11 runs. Run it yourself: npx agenticow bench

0.1ms 1ms 10ms 100ms 10k 100k 1M base vectors 67 ms 0.47 ms (flat)
agenticow branch create — O(1) in base full file copy — O(base)
← swipe table →
Base NBase fileBranch create (p50) Empty branch100-edit branch Full copy (p50)SpeedupSmaller
10,0005.0 MB519 µs162 B51.4 KB373 µs32,102×
100,00049.6 MB463 µs162 B51.4 KB5.83 ms13×321,037×
1,000,000496.3 MB472 µs162 B51.4 KB67.14 ms142×3,212,443×

Branch delta is a pure function of edit count (~520 B/edited vector) with zero dependence on base size. At small bases a raw copyFile is already sub-millisecond, so the COW win shows up — and widens — at scale.

Multi-approach comparison — performance · storage · cost

Scenario: 1,000 branches over a 1M-vector base (dim 128, ~496 MB base). green = measured on agenticow (AMD Ryzen 9 9950X); other approaches are amber = published / estimated with sources cited — not fabricated.

← swipe table →
Approach Branch / snapshot create Per-branch storage Query latency (ANN) Cost @ 1,000 branches Native COW branch / rollback
agenticow (COW) 0.47 ms / 162 B measured, flat to 1M ~10.8 KB measured ~6.3× behind hnswlib @ 1M measured* ~507 MB local · ~$0 infra (embedded) measured† ✓ instant (p50 0.57 ms)
Naive full-copy 67 ms / 496 MB measured @ 1M full base (~496 MB) = source engine ~484 GB local measured × N ✗ (copy, not COW)
Pinecone (serverless) no native branch — full re-upsert/copy full copy (managed) fast (core strength) ~484 GB stored ≈ $160/mo storage + read/write units est.¹
Milvus collection snapshot = full copy / reindex full copy fast (core strength) ~484 GB resident → large RAM cluster, $$$/mo infra est.²
Qdrant snapshot = full copy full copy fast (core strength) ~484 GB → managed/self-host, $$$/mo est.³
pgvector SQL dump / table copy + reindex full copy moderate ~484 GB in Postgres, reindex per copy est.
Chroma full copy of the collection full copy moderate ~484 GB local/managed est.
lakeFS / DVC fast metadata branch (their strength) file-level delta (cheap) n/a — not a vector engine cheap branching, but you still build/serve the ANN index yourself published ✓ for data/files · ✗ for the vector index

Takeaway: agenticow wins decisively on branch-create speed, per-branch storage, and multi-branch cost — and is the only option with native COW branching + instant rollback of a live vector memory. It concedes raw ANN search speed to the dedicated vector DBs (Pinecone / Qdrant / Milvus); use those when single-index query throughput is the priority, and agenticow when you need cheap branching, checkpointing, and rollback of agent memory.

* SIFT-1M same-machine (matched recall@10 ≈ 0.97): ruvector HNSW is ~6.3× behind a dedicated flat-index engine like hnswlib-node at 1M-vector scale (~2.7× on small in-cache sets; the earlier ~2.7× was a 100K-vector synthetic set that fit in L3 cache). Deliberate trade — agenticow competes on memory versioning/isolation/rollback, not raw search speed; narrowing levers (graph-quality shrink-heuristic + stack-local heaps) are on the ruvector-engine roadmap. † agenticow base (~496 MB) + 1,000 × ~10.8 KB ≈ 507 MB; runs in-process, no managed infra. ¹ Pinecone serverless storage est. from pinecone.io/pricing (~$0.33/GB-month) — storage only, excludes read/write units. ² Milvus / Zilliz Cloud est. from zilliz.com/pricing (RAM-resident, cluster-sized). ³ Qdrant est. from qdrant.tech/pricing. All competitor figures are published/estimated and labeled as such; only agenticow's are measured here.

How it compares

agenticow is the only one with native copy-on-write branching of the vector index. We concede raw ANN throughput honestly — pick the right tool for the job.

← swipe table →
CapabilityagenticowPineconeMilvuspgvectorChromaQdrant
Native COW branch of the index
O(1)-in-base branch create✓ 162 B
Snapshot mechanismCOW deltafull copyfull copySQL dumpfull copyfull copy
Exact read-through (parent ∪ edits)
Embedded / in-process (no server)via PG✓/server
Raw ANN throughput~6.3× behind hnswlib @ 1M*highhighmoderatemoderatehigh
ANN search spanning the branch✓ shipped (recall@10 ≈ 1.0, linux-x64*)n/an/an/an/an/a

*On a measured SIFT-1M benchmark (same machine, matched recall@10 ≈ 0.97), the underlying ruvector HNSW is ~6.3× behind a dedicated flat-index engine like hnswlib at 1M-vector scale (~2.7× on small in-cache sets — the earlier figure was a 100K-vector synthetic set that fit in L3 cache; the gap widens at 1M). This is a deliberate trade: agenticow does not compete on raw single-index search throughput — its unique capability is memory versioning, isolation, and lifecycle governance for multi-tenant agent fleets (1,000 parallel isolated reversible branches at ~0.5 ms/fork, which no flat ANN engine offers). Future levers to narrow the gap (graph-quality shrink-heuristic + stack-local heaps) are on the ruvector-engine roadmap, not agenticow's pitch. If you need maximum raw ANN speed on a static index, use a dedicated ANN library. Native ANN-across-branch (fork({nativeAnn:true})) ships for linux-x64-gnu today; other platforms degrade gracefully to exact read-through.

Usage

One small API. ESM, Node ≥ 18.

import { open } from 'agenticow';

// open or create a base memory
const base = open('memory.rvf', { dimension: 1536 });
base.ingest([{ id: 1, vector: embedding }, /* ... */]);

// branch it for a parallel agent — ~0.5 ms / 162 B, any base size
const agent = base.branch('agent-a');
agent.ingest([{ id: 9001, vector: newMemory }]); // isolated

// exact read-through: sees the base + its own edits, child wins on collision
const hits = agent.query(queryVector, 10);

// NEW 0.2.0 — native ANN ACROSS the branch (single Rust dual-graph query)
const fast = base.fork('agent-b', null, { nativeAnn: true });
fast.query(queryVector, 10); // parent ∪ edits via native HNSW, recall@10 ≈ 1.0

// checkpoint + rollback a poisoned branch
const ckpt = agent.checkpoint('clean');
agent.ingest([{ id: 666, vector: poison }]);
agent.rollback(ckpt.id);  // poison gone, clean memory intact
Honest scope. agenticow ships COW branch creation (the 83×/3000× headline, proven), exact read-through queries (parent ∪ edits, child wins, deletes honored), and — new in 0.2.0native ANN search ACROSS the COW boundary: fork(label, file, {nativeAnn:true}) runs a single Rust dual-graph HNSW query over parent ∪ child (RuVector PR #617/#618), verified recall@10 ≈ 1.0 (0.999) at 5,000 base ∪ 200 edits, dim 128, default cosine. Platform: the native binary ships for linux-x64-gnu today; darwin / win / linux-arm64 are pending a CI cross-compile and degrade gracefully to exact read-through (identical correctness). We still concede raw single-index ANN throughput to dedicated vector DBs (~6.3× behind a dedicated flat-index engine like hnswlib at 1M-vector scale, matched recall@10 ≈ 0.97; ~2.7× on small in-cache sets) — a deliberate trade, since agenticow competes on memory versioning/isolation/rollback, not raw search speed.

Runnable examples

One per use case, in examples/node examples/X.mjs or npm run examples. Deterministic output (seeded RNG).

personalization

One base, a cheap COW branch per user — isolation + delta-only storage.

rollback-quarantine

Poison a branch, discard it → base instantly clean (before/after query).

checkpointing

Checkpoint every 10 steps, crash at 31, resume from 30 — no replay.

git-workflow

branch → ingest → diffpromote the agent→test→prod memory pipeline.

ab-branches

N variant branches, score each, promote the winner (A/B / Darwin-style).

parallel-agents

Fan out N agent branches off one base, query each, roll one back.

Claim ladder — runnable, executed, benchmarked

Every tier is backed by code you can run. PROVEN = bench + acceptance · DEMONSTRATED = executed + benchmarked · PoC = mechanics shown, cognition out of scope.

← swipe table →
TierWhat it showsStatusExamples / bench
Practicalbranch / checkpoint / rollback + exact read-through✅ PROVEN1,000-branch acceptance, recall@10 = 100%
Platformpromotion pipeline · compliance & right-to-erasure · A/B at scale✅ DEMONSTRATED + benchmarkedfork 464 µs · score 133 µs · promote 897 µs · contradiction-check ~1M pairs/s · 0.84 KB/branch
Exoticparallel selves · Darwin-on-memory · simulated org (contradiction scan)⚗️ PoC (mechanics only)cognition out of scope — judge/fitness is a scoring function, not an LLM

Run: npm run examples · npm run examples:platform · npm run examples:exotic · npm run bench:ladder. Numbers measured on AMD Ryzen 9 9950X, base=5,000, dim=64.