Specifications a fleet of agents can build (Writing)

Coding agents are individually capable. That fact is no longer interesting on its own. The interesting question is structural: can you take a specification for a whole system and have it built by orchestrating many agents at once, correctly, reproducibly, and faster than a person? The answer, in our daily practice, is yes, but only under conditions that have very little to do with the cleverness of any single agent.

It starts before any code is written, with a primer: one document that fixes the architecture, the stack, the conventions, and the definition of done. The primer is decomposed into a numbered sequence of small task files, each scoped to be independently shippable. We do not start coding from a chat thread, because a chat thread has no edges, and edges are what make work verifiable.

Then the load-bearing part: every task must pass a verification gate before it counts as done. It formats, it lints, it type-checks, its tests pass. Nothing advances on a red gate. An agent does not get to declare success; the gate declares it. Decomposition is what makes the work parallelizable. Verification is what makes the parallelism safe. Without the second, the first is just a faster way to accumulate broken state.

FIG. SPEC → TASKS → RUNNER → GATES

REF: FN-002 A logic flow: input decomposed into tasks, run through gates, to output

Fleets run in parallel. Each agent works in an isolated workspace, a dedicated git worktree, so concurrent changes cannot collide. The unit of work is deliberately small, closer to a single squashed commit than a feature, which is exactly what keeps each task legible and each result reviewable. Isolation buys parallelism cheaply: conflicts are deferred to a deliberate integration step instead of happening live, and 'many agents editing one tree' stops being a merge nightmare.

The long-horizon case is a different shape. Software is built over days, not single sessions, and the way to advance a multi-day plan is not a bigger context window. It is durable state plus small, resumable, interruptible steps. We run tick-based agents that wake, consult the plan, take one checkpointed step, persist, and exit. A tick that ends cleanly is worth more than a marathon run that drifts. Long horizons are a state problem, not a context problem.

Through all of it, the human stays exactly where judgment is decisive: writing the spec, handling the interrupts, signing off at the gate. The method is not 'remove the human.' It is 'spend the human's judgment where it matters and automate the rest.' The moat is not any single technique. It is the combination, run with discipline.

All field notesarrow_forward