Deployment patterns — what the data says

Smarter orchestration, not smarter execution. Each pattern cites a head-to-head scaffolding ablation on cheap models (FRAMES, deepseek-v4-pro + glm-5.2, n=50, strict EM, reasoning OFF) — SCAFFOLDING-ABLATION.md.

⚡

Fail-fast, shallow branches

Deep self-refine backfires on cheap models: Plan-and-Solve −10pp, Reflexion −8pp at 2.85× cost, and ReAct saturates ~8–12 steps. So run 2–3-turn tasks; on failure drop the branch (~0.5 ms) and respawn — don't force self-correction.

↔️

Scale horizontally

Cheap models win on the first shot; the only gain (Self-Consistency +4–6pp) saturates by N≈7. The product win is multi-tenant scale: 1,000 isolated branches at 943× less disk. Many shallow agents, not one deep one.

🔒

External, deterministic gates

A cheap LM-as-judge picks worse than majority vote (−4 to −6pp). Gate promote with tests / compilers / schemas / a human — never the cheap model. Nuance: on code, tests are a zero-cost oracle, so test-gated promotion is strong (the bridge to @metaharness/jujutsu).

🧩

Infra layer, not a brain

agenticow is Git for agent memory, not a cognitive enhancer. Like Git, it lets thousands work concurrently, isolate, roll back, merge through CI — it makes cheap-model fleets governed, isolated, ~free. It does not make them smarter (RAG null + scaffolds backfired).

Honest framing. The data says nothing at the orchestration layer reliably makes a cheap model smarter — RAG was null and every reasoning scaffold backfired or failed to pay for itself. agenticow's claim is leverage, not intelligence: infrastructure that turns "run 1,000 cheap agents safely" into a near-free, auditable operation.

Flagship production patterns — runnable + executed

Four end-to-end use cases, one paradigm: Branch → Mutate → external-Verify → Promote / Discard. Selection is always a deterministic external verifier (test / regex / checker / distance) — never a cheap LM-as-judge (the ablation showed that's a negative selector). These show the branching mechanics, not model intelligence. Numbers below are real measured output (AMD Ryzen 9 9950X, Node v22). Run all: npm run examples:production.

import { openBase } from 'agenticow';

const base = openBase('kb.rvf', { dimension: 1536 });
const sandbox = base.fork('untrusted-doc');           // Branch
sandbox.ingest(docEmbedding, { text: 'untrusted doc' }); // Mutate
const hits = sandbox.query(injectionSignature, { topK: 3 });
const exploit = hits.some(h => h.distance < 0.02);   // external-Verify (deterministic)
if (exploit) sandbox.rollback();  // Discard — base never poisoned (blast radius 0)
else         sandbox.promote();   // Promote — vetted delta merged into base

🛡️

red-team-sandbox

Untrusted-doc ingestion behind a COW fork. A deterministic Security-Prober (injection-signature distance probe) gates it: exploit → rollback() in 1.1 ms, 0 vectors reached base; clean → promote().

🗳️

multi-persona-consensus

5 persona branches off one base. An external judge — policy-constraint gate then distance-to-rubric score, not a cheap LM — picks the winner: 4/5 qualified, winner promoted, losers discarded free.

⏪

time-travel-debug

24-step migration, checkpoint every 5. A latent bug at step 12 trips a compiler-style check at step 24 → rewind to the step-10 checkpoint in 1.1 ms, 0 steps replayed, 24/24 reachable.

🏢

multi-tenant-saas

One mmapped base, 1,000 isolated tenant branches. An isolation oracle proves no cross-tenant leak: 0/200 probes leaked, 2.4 KB/tenant, 530× less disk than full copies.

Honest scope. External verifiers only (stated). These demonstrate the Branch→Verify→Promote mechanics — production-ready patterns — not model intelligence. Full outputs in examples/README.md.

Benchmarks — branch create vs full copy

Reproduced on an AMD Ryzen 9 9950X (32 threads), Node v22, dim 128, cosine, median of 11 runs. Run it yourself: npx agenticow bench

agenticow branch create — O(1) in base full file copy — O(base)

← swipe table →

Base N	Base file	Branch create (p50)	Empty branch	100-edit branch	Full copy (p50)	Speedup	Smaller
10,000	5.0 MB	519 µs	162 B	51.4 KB	373 µs	1×	32,102×
100,000	49.6 MB	463 µs	162 B	51.4 KB	5.83 ms	13×	321,037×
1,000,000	496.3 MB	472 µs	162 B	51.4 KB	67.14 ms	142×	3,212,443×

Branch delta is a pure function of edit count (~520 B/edited vector) with zero dependence on base size. At small bases a raw copyFile is already sub-millisecond, so the COW win shows up — and widens — at scale.

Multi-approach comparison — performance · storage · cost

Scenario: 1,000 branches over a 1M-vector base (dim 128, ~496 MB base). green = measured on agenticow (AMD Ryzen 9 9950X); other approaches are amber = published / estimated with sources cited — not fabricated.

← swipe table →

Approach	Branch / snapshot create	Per-branch storage	Query latency (ANN)	Cost @ 1,000 branches	Native COW branch / rollback
agenticow (COW)	0.47 ms / 162 B measured, flat to 1M	~10.8 KB measured	~6.3× behind hnswlib @ 1M measured*	~507 MB local · ~$0 infra (embedded) measured†	✓ instant (p50 0.57 ms)
Naive full-copy	67 ms / 496 MB measured @ 1M	full base (~496 MB)	= source engine	~484 GB local measured × N	✗ (copy, not COW)
Pinecone (serverless)	no native branch — full re-upsert/copy	full copy (managed)	fast (core strength)	~484 GB stored ≈ $160/mo storage + read/write units est.¹	✗
Milvus	collection snapshot = full copy / reindex	full copy	fast (core strength)	~484 GB resident → large RAM cluster, $$$/mo infra est.²	✗
Qdrant	snapshot = full copy	full copy	fast (core strength)	~484 GB → managed/self-host, $$$/mo est.³	✗
pgvector	SQL dump / table copy + reindex	full copy	moderate	~484 GB in Postgres, reindex per copy est.	✗
Chroma	full copy of the collection	full copy	moderate	~484 GB local/managed est.	✗
lakeFS / DVC	fast metadata branch (their strength)	file-level delta (cheap)	n/a — not a vector engine	cheap branching, but you still build/serve the ANN index yourself published	✓ for data/files · ✗ for the vector index

Takeaway: agenticow wins decisively on branch-create speed, per-branch storage, and multi-branch cost — and is the only option with native COW branching + instant rollback of a live vector memory. It concedes raw ANN search speed to the dedicated vector DBs (Pinecone / Qdrant / Milvus); use those when single-index query throughput is the priority, and agenticow when you need cheap branching, checkpointing, and rollback of agent memory.

* SIFT-1M same-machine (matched recall@10 ≈ 0.97): ruvector HNSW is ~6.3× behind a dedicated flat-index engine like hnswlib-node at 1M-vector scale (~2.7× on small in-cache sets; the earlier ~2.7× was a 100K-vector synthetic set that fit in L3 cache). Deliberate trade — agenticow competes on memory versioning/isolation/rollback, not raw search speed; narrowing levers (graph-quality shrink-heuristic + stack-local heaps) are on the ruvector-engine roadmap. † agenticow base (~496 MB) + 1,000 × ~10.8 KB ≈ 507 MB; runs in-process, no managed infra. ¹ Pinecone serverless storage est. from pinecone.io/pricing (~$0.33/GB-month) — storage only, excludes read/write units. ² Milvus / Zilliz Cloud est. from zilliz.com/pricing (RAM-resident, cluster-sized). ³ Qdrant est. from qdrant.tech/pricing. All competitor figures are published/estimated and labeled as such; only agenticow's are measured here.

How it compares

agenticow is the only one with native copy-on-write branching of the vector index. We concede raw ANN throughput honestly — pick the right tool for the job.

← swipe table →

Capability	agenticow	Pinecone	Milvus	pgvector	Chroma	Qdrant
Native COW branch of the index	✓	✗	✗	✗	✗	✗
O(1)-in-base branch create	✓ 162 B	✗	✗	✗	✗	✗
Snapshot mechanism	COW delta	full copy	full copy	SQL dump	full copy	full copy
Exact read-through (parent ∪ edits)	✓	✗	✗	✗	✗	✗
Embedded / in-process (no server)	✓	✗	✗	via PG	✓	✓/server
Raw ANN throughput	~6.3× behind hnswlib @ 1M*	high	high	moderate	moderate	high
ANN search spanning the branch	✓ shipped (recall@10 ≈ 1.0, linux-x64*)	n/a	n/a	n/a	n/a	n/a

*On a measured SIFT-1M benchmark (same machine, matched recall@10 ≈ 0.97), the underlying ruvector HNSW is ~6.3× behind a dedicated flat-index engine like hnswlib at 1M-vector scale (~2.7× on small in-cache sets — the earlier figure was a 100K-vector synthetic set that fit in L3 cache; the gap widens at 1M). This is a deliberate trade: agenticow does not compete on raw single-index search throughput — its unique capability is memory versioning, isolation, and lifecycle governance for multi-tenant agent fleets (1,000 parallel isolated reversible branches at ~0.5 ms/fork, which no flat ANN engine offers). Future levers to narrow the gap (graph-quality shrink-heuristic + stack-local heaps) are on the ruvector-engine roadmap, not agenticow's pitch. If you need maximum raw ANN speed on a static index, use a dedicated ANN library. Native ANN-across-branch (fork({nativeAnn:true})) ships for linux-x64-gnu today; other platforms degrade gracefully to exact read-through.

Usage

One small API. ESM, Node ≥ 18.

import { open } from 'agenticow';

// open or create a base memory
const base = open('memory.rvf', { dimension: 1536 });
base.ingest([{ id: 1, vector: embedding }, /* ... */]);

// branch it for a parallel agent — ~0.5 ms / 162 B, any base size
const agent = base.branch('agent-a');
agent.ingest([{ id: 9001, vector: newMemory }]); // isolated

// exact read-through: sees the base + its own edits, child wins on collision
const hits = agent.query(queryVector, 10);

// NEW 0.2.0 — native ANN ACROSS the branch (single Rust dual-graph query)
const fast = base.fork('agent-b', null, { nativeAnn: true });
fast.query(queryVector, 10); // parent ∪ edits via native HNSW, recall@10 ≈ 1.0

// checkpoint + rollback a poisoned branch
const ckpt = agent.checkpoint('clean');
agent.ingest([{ id: 666, vector: poison }]);
agent.rollback(ckpt.id);  // poison gone, clean memory intact

Honest scope. agenticow ships COW branch creation (the 83×/3000× headline, proven), exact read-through queries (parent ∪ edits, child wins, deletes honored), and — new in 0.2.0 — native ANN search ACROSS the COW boundary: fork(label, file, {nativeAnn:true}) runs a single Rust dual-graph HNSW query over parent ∪ child (RuVector PR #617/#618), verified recall@10 ≈ 1.0 (0.999) at 5,000 base ∪ 200 edits, dim 128, default cosine. Platform: the native binary ships for linux-x64-gnu today; darwin / win / linux-arm64 are pending a CI cross-compile and degrade gracefully to exact read-through (identical correctness). We still concede raw single-index ANN throughput to dedicated vector DBs (~6.3× behind a dedicated flat-index engine like hnswlib at 1M-vector scale, matched recall@10 ≈ 0.97; ~2.7× on small in-cache sets) — a deliberate trade, since agenticow competes on memory versioning/isolation/rollback, not raw search speed.

Runnable examples

One per use case, in examples/ — node examples/X.mjs or npm run examples. Deterministic output (seeded RNG).

personalization

One base, a cheap COW branch per user — isolation + delta-only storage.

rollback-quarantine

Poison a branch, discard it → base instantly clean (before/after query).

checkpointing

Checkpoint every 10 steps, crash at 31, resume from 30 — no replay.

git-workflow

branch → ingest → diff → promote the agent→test→prod memory pipeline.

ab-branches

N variant branches, score each, promote the winner (A/B / Darwin-style).

parallel-agents

Fan out N agent branches off one base, query each, roll one back.

Claim ladder — runnable, executed, benchmarked

Every tier is backed by code you can run. PROVEN = bench + acceptance · DEMONSTRATED = executed + benchmarked · PoC = mechanics shown, cognition out of scope.

← swipe table →

Tier	What it shows	Status	Examples / bench
Practical	branch / checkpoint / rollback + exact read-through	✅ PROVEN	1,000-branch acceptance, recall@10 = 100%
Platform	promotion pipeline · compliance & right-to-erasure · A/B at scale	✅ DEMONSTRATED + benchmarked	fork 464 µs · score 133 µs · promote 897 µs · contradiction-check ~1M pairs/s · 0.84 KB/branch
Exotic	parallel selves · Darwin-on-memory · simulated org (contradiction scan)	⚗️ PoC (mechanics only)	cognition out of scope — judge/fitness is a scoring function, not an LLM

Run: npm run examples · npm run examples:platform · npm run examples:exotic · npm run bench:ladder. Numbers measured on AMD Ryzen 9 9950X, base=5,000, dim=64.

agenticow
Git for Agent Memory

What is copy-on-write for vectors?

Branch, don't copy

Exact read-through

Instant rollback

Three things it makes cheap

Parallel agents, one base memory

Roll back a poisoned branch

Zero-cost checkpointing

Deployment patterns — what the data says

Fail-fast, shallow branches

Scale horizontally

External, deterministic gates

Infra layer, not a brain

Flagship production patterns — runnable + executed

red-team-sandbox

multi-persona-consensus

time-travel-debug

multi-tenant-saas

Benchmarks — branch create vs full copy

Multi-approach comparison — performance · storage · cost

How it compares

Usage

Runnable examples

personalization

rollback-quarantine

checkpointing

git-workflow

ab-branches

parallel-agents

Claim ladder — runnable, executed, benchmarked

agenticowGit for Agent Memory

What is copy-on-write for vectors?

Branch, don't copy

Exact read-through

Instant rollback

Three things it makes cheap

Parallel agents, one base memory

Roll back a poisoned branch

Zero-cost checkpointing

Deployment patterns — what the data says

Fail-fast, shallow branches

Scale horizontally

External, deterministic gates

Infra layer, not a brain

Flagship production patterns — runnable + executed

red-team-sandbox

multi-persona-consensus

time-travel-debug

multi-tenant-saas

Benchmarks — branch create vs full copy

Multi-approach comparison — performance · storage · cost

How it compares

Usage

Runnable examples

personalization

rollback-quarantine

checkpointing

git-workflow

ab-branches

parallel-agents

Claim ladder — runnable, executed, benchmarked

agenticow
Git for Agent Memory