Personal · Sole Architect & Operator

Agent Infrastructure Evolution

An AI agent system that accumulated memory faster than it could use it. The fix required hierarchy, hard caps on lesson counts, and rules about which memory files get loaded at spawn time.

Context window saturation

Unbounded memory growth caused agents to load so much context they couldn't do work. At V0 growth rate, the system would suffocate itself within 2-3 months.

Memory corruption risk

Wrong lessons promoted through echo rather than independent confirmation. Agents converge on the same conclusion because they all read the same predecessor's output.

Silent agent death

Agents dying without the coordinator noticing. No monitoring, no respawn, no recovery — work simply vanished mid-execution.

910 Active Lessons

Across 34 specialists and 6 managers, all staying within context budget. Each lesson is reviewed before it gets written to memory.

3 Architecture Versions

V0 → V1 → V2, each solving a specific failure in the previous version. Iterated based on real operational failures.

10 Lesson Cap

Hard limit per specialist memory. Overflow archives automatically. When a new lesson needs to be added, the oldest one gets moved to an archive file.

Memory Architecture

Designing the memory architecture

A coordinator talks to the user, managers sequence the work, and specialists write code. Memory flows upward through compressed briefs and gets capped before it can clog the context window.

V0 — Flat System

35 agents, all loading everything

500+ lessons, no caps

context window saturation

Unbounded memory growth,
silent quality degradation

evolves

Architecture Decisions

Manager / Specialist Tiers

coordinator

↳

6 managers

→

34 specialists

managers coordinate, specialists build

Hard Memory Caps

lessons per specialist

lessons per project file

coordination lessons

overflow archives automatically

Selective Reading

Load 1-3 relevant files, not all memory
Briefs compress to 3-5K tokens
Worst case: ~20-35 lessons per spawn

produces

V2 — Local-First

~910 managed lessons

Within context budget

Local filesystem, no cloning

Specialists learn across projects
memory flows upward, caps curate

Component anatomy

Three versions, three different failure modes

The system went from flat agents sharing everything to a hierarchical architecture with bounded memory. The central problem is the same one any stateful system hits: state grows, but the resources available to process it do not.

V0 Flat System

Context Window Usage

saturated

35 agents, everyone reads everything

500+ accumulated lessons, no caps

"Why are agents getting worse over time?"

FAILURE MODE

Context window saturation

35 agents accumulated 500+ lessons with no memory limits. Every agent loaded all handoff files from predecessors. When I audited context window usage, the problem was clear: agents were spending so much of their context window on historical memory that they couldn't do effective work.

Why it broke Memory grew linearly while context windows stayed fixed. At V0 growth rate, the system would become unusable within 2-3 months. New lessons kept displacing the space agents needed for actual task work.

V2 Architecture

Coordinator

Manager

Spec

Brief Format

3-5K tokens max

Selective Reading

1-3 memory files per spawn

"How do you scale without suffocating?"

ARCHITECTURE

Hierarchy + selective reading + caps

Restructured into two tiers: 6 managers that coordinate, 34 specialists that build. Managers never write code. Specialists never talk to the user. Briefs compress manager output to 3-5K tokens, and selective reading loads only 1-3 relevant memory files per spawn.

Familiar patterns Selective reading is a form of caching. Memory caps are garbage collection. Briefs are lossy compression. These are standard approaches to state accumulation, applied to an agent system instead of a database or OS.

Memory Flow

Specialist does work

Includes Lessons section in deliverable

Manager curates

Decides what's worth recording to memory.md

Lessons injected into future spawns

3-5 relevant lessons from memory, not all 10

⇅ Cap at 10 — overflow archives automatically

"How do agents actually learn?"

MECHANISM

Upward memory flow with hard caps

Memory flows upward: specialists include lessons in their deliverables, managers decide what's worth recording, and curated lessons get injected into future spawn prompts. The hard cap of 10 lessons per specialist forces continuous curation — only the most valuable lessons survive.

Why selective injection matters Even with a cap of 10, loading all lessons every time wastes context. Managers pick 3-5 relevant lessons per task — a frontend bug fix doesn't need architecture lessons.