Josh Estrada
Back to work
Personal · Sole Architect & Operator

Agent Infrastructure Evolution

An AI agent system that accumulated memory faster than it could use it. The fix required hierarchy, hard caps on lesson counts, and rules about which memory files get loaded at spawn time.

Context window saturation

Unbounded memory growth caused agents to load so much context they couldn't do work. At V0 growth rate, the system would suffocate itself within 2-3 months.

Memory corruption risk

Wrong lessons promoted through echo rather than independent confirmation. Agents converge on the same conclusion because they all read the same predecessor's output.

Silent agent death

Agents dying without the coordinator noticing. No monitoring, no respawn, no recovery — work simply vanished mid-execution.

910 Active Lessons

Across 34 specialists and 6 managers, all staying within context budget. Each lesson is reviewed before it gets written to memory.

3 Architecture Versions

V0 → V1 → V2, each solving a specific failure in the previous version. Iterated based on real operational failures.

10 Lesson Cap

Hard limit per specialist memory. Overflow archives automatically. When a new lesson needs to be added, the oldest one gets moved to an archive file.

Memory Architecture

Designing the memory architecture

A coordinator talks to the user, managers sequence the work, and specialists write code. Memory flows upward through compressed briefs and gets capped before it can clog the context window.

V0 — Flat System
35 agents, all loading everything
500+ lessons, no caps
context window saturation
Unbounded memory growth,
silent quality degradation
evolves
Architecture Decisions
Manager / Specialist Tiers
coordinator
6 managers
34 specialists
managers coordinate, specialists build
Hard Memory Caps
10
lessons per specialist
10
lessons per project file
15
coordination lessons
overflow archives automatically
Selective Reading
Load 1-3 relevant files, not all memory
Briefs compress to 3-5K tokens
Worst case: ~20-35 lessons per spawn
produces
V2 — Local-First
~910 managed lessons
Within context budget
Local filesystem, no cloning
Specialists learn across projects
memory flows upward, caps curate
Component anatomy

Three versions, three different failure modes

The system went from flat agents sharing everything to a hierarchical architecture with bounded memory. The central problem is the same one any stateful system hits: state grows, but the resources available to process it do not.

V0 Flat System
Context Window Usage
saturated
35 agents, everyone reads everything
500+ accumulated lessons, no caps
"Why are agents getting worse over time?"
FAILURE MODE
Context window saturation

35 agents accumulated 500+ lessons with no memory limits. Every agent loaded all handoff files from predecessors. When I audited context window usage, the problem was clear: agents were spending so much of their context window on historical memory that they couldn't do effective work.

Why it broke Memory grew linearly while context windows stayed fixed. At V0 growth rate, the system would become unusable within 2-3 months. New lessons kept displacing the space agents needed for actual task work.
V2 Architecture
Coordinator
Manager
Manager
Spec
Spec
Spec
Spec
Brief Format
3-5K tokens max
Selective Reading
1-3 memory files per spawn
"How do you scale without suffocating?"
ARCHITECTURE
Hierarchy + selective reading + caps

Restructured into two tiers: 6 managers that coordinate, 34 specialists that build. Managers never write code. Specialists never talk to the user. Briefs compress manager output to 3-5K tokens, and selective reading loads only 1-3 relevant memory files per spawn.

Familiar patterns Selective reading is a form of caching. Memory caps are garbage collection. Briefs are lossy compression. These are standard approaches to state accumulation, applied to an agent system instead of a database or OS.
Memory Flow
1
Specialist does work
Includes Lessons section in deliverable
2
Manager curates
Decides what's worth recording to memory.md
3
Lessons injected into future spawns
3-5 relevant lessons from memory, not all 10
Cap at 10 — overflow archives automatically
"How do agents actually learn?"
MECHANISM
Upward memory flow with hard caps

Memory flows upward: specialists include lessons in their deliverables, managers decide what's worth recording, and curated lessons get injected into future spawn prompts. The hard cap of 10 lessons per specialist forces continuous curation — only the most valuable lessons survive.

Why selective injection matters Even with a cap of 10, loading all lessons every time wastes context. Managers pick 3-5 relevant lessons per task — a frontend bug fix doesn't need architecture lessons.