Agent Infrastructure Evolution
An AI agent system that accumulated memory faster than it could use it. The fix required hierarchy, hard caps on lesson counts, and rules about which memory files get loaded at spawn time.
Unbounded memory growth caused agents to load so much context they couldn't do work. At V0 growth rate, the system would suffocate itself within 2-3 months.
Wrong lessons promoted through echo rather than independent confirmation. Agents converge on the same conclusion because they all read the same predecessor's output.
Agents dying without the coordinator noticing. No monitoring, no respawn, no recovery — work simply vanished mid-execution.
Across 34 specialists and 6 managers, all staying within context budget. Each lesson is reviewed before it gets written to memory.
V0 → V1 → V2, each solving a specific failure in the previous version. Iterated based on real operational failures.
Hard limit per specialist memory. Overflow archives automatically. When a new lesson needs to be added, the oldest one gets moved to an archive file.
Designing the memory architecture
A coordinator talks to the user, managers sequence the work, and specialists write code. Memory flows upward through compressed briefs and gets capped before it can clog the context window.
silent quality degradation
Briefs compress to 3-5K tokens
Worst case: ~20-35 lessons per spawn
memory flows upward, caps curate
Three versions, three different failure modes
The system went from flat agents sharing everything to a hierarchical architecture with bounded memory. The central problem is the same one any stateful system hits: state grows, but the resources available to process it do not.
Context window saturation 35 agents accumulated 500+ lessons with no memory limits. Every agent loaded all handoff files from predecessors. When I audited context window usage, the problem was clear: agents were spending so much of their context window on historical memory that they couldn't do effective work.
Hierarchy + selective reading + caps Restructured into two tiers: 6 managers that coordinate, 34 specialists that build. Managers never write code. Specialists never talk to the user. Briefs compress manager output to 3-5K tokens, and selective reading loads only 1-3 relevant memory files per spawn.
Upward memory flow with hard caps Memory flows upward: specialists include lessons in their deliverables, managers decide what's worth recording, and curated lessons get injected into future spawn prompts. The hard cap of 10 lessons per specialist forces continuous curation — only the most valuable lessons survive.