Autonomous Build Systems (Research)

RESEARCH // SPEC_ID: ABS-00

Can a fleet of agents build the spec?

A specification is a plan for work. The research question is whether the work itself can be done by a fleet of agents that decompose it, run it in parallel, and gate it by verification, while a human stays in the loop only where judgment is required.

Not whether one agent can write one function. Whether you can take the spec for a whole system and have it built by orchestrating many agents under discipline: correctly, reproducibly, and faster than a person.

STATUS: ACTIVE BUILD // IN DAILY PRODUCTION USE // PROOF BEHIND THE BUILD METHOD

FIG. SPEC → TASKS → GATES

REF: ABS-00 Autonomous build pipeline: a specification fans out into numbered tasks, dispatched to agents, each passing a verification gate

01 // The question

Coding agents are capable. The interesting part is the orchestration.

Individually, coding agents are good at their job. The question that matters for real software is not whether a single agent can write a function. It is whether you can take a written specification for a whole system and have it built, correctly and reproducibly, by many agents working at once. Two sub-questions sit underneath the surface, and most of the difficulty lives in them.

REF: ABS-Q1

account_tree

Decomposition

How do you turn a spec into units of work small enough for an agent to do well, and independently verifiable, so that "done" is a fact rather than an opinion?

REF: ABS-Q2

timeline

Long horizons

Software is built over days, not single sessions. How does an agent advance a long plan across many wakes without losing the thread or drifting from the spec?

02 // The approach

Two pillars, built into our own day-to-day construction.

This is not a thought experiment. It is how we ship, on top of open-source components. The architecture has two pillars: spec-driven decomposition with verification gates, and fleets of agents in isolated workspaces. Above them sits a third line of research for the long horizon.

REF: ABS-P1

checklist

Spec-driven decomposition with gates

A project begins as a written specification, decomposed into a numbered sequence of tasks, each scoped to be independently verifiable. A runner dispatches agents against those tasks, and every task must pass a verification gate before it counts as done: it formats, it lints, it type-checks, its tests pass.

An agent does not get to declare success. The gate declares it. Decomposition makes the work parallelizable; verification makes the parallelism safe.

REF: ABS-P2

hub

Fleets in isolated workspaces

Multiple coding agents run in parallel, each in an isolated workspace: git worktrees, so their changes cannot collide. The natural unit is small, closer to a single squashed commit than a feature, which keeps each task legible and each result reviewable.

A coordinator tracks which units are in flight, which passed their gates, and which need a human. Throughput without the chaos of many agents editing the same tree.

REF: ABS-P3

schedule

Long-horizon, tick-based agents

For goals spanning far more than one session, we research a different shape: a tick-based agent invoked periodically. Each tick advances an active plan by one step: consult the plan, take the next super-step or surface an interruption, persist state, exit.

Long goals progress across hundreds of small wakes rather than one heroic run. State, checkpointing, and human-interrupt handling are first-class. Plans are markdown-with-frontmatter; capabilities are packaged as reusable skills.

03 // What we learned

Independent verifiability is the whole game.

The hard part of decomposition is not making tasks small. It is making each one carry its own definition of done. A gate an agent cannot argue with is what makes autonomous construction trustworthy. It is the difference between knowing which knob to turn and hoping you turned it.

Independent verifiability is the whole game: every task carries its own definition of done.
Isolation buys parallelism cheaply. Per-agent worktrees turn "many agents" from a merge nightmare into a throughput win.
Long horizons are a state problem, not a context problem. Durable state plus small, resumable, interruptible steps beats a bigger window.
The human belongs at the gates and the interrupts: spend judgment where it is decisive, automate the rest.

FIG. STOP / INSPECT / RESUME

REF: ABS-STATE Recovery flow: a tick-based agent that can be stopped, inspected, corrected, and resumed without losing the plan

// DONE := GATE_PASSED, NOT AGENT_CLAIMED

// UNIT ≈ ONE SQUASHED COMMIT, NOT A FEATURE

// PLAN ADVANCES PER TICK, ACROSS HUNDREDS OF WAKES

04 // Where it is going

Tighter coupling of the two pillars.

The direction is toward fleets of gated agents driving short-horizon construction, and tick-based agents owning the long-horizon plan above them. The throughline is the discipline we apply everywhere. Make the unit of work small, make "done" verifiable, and keep the human exactly where judgment lives.

Tune to the right signal, then know which knob to turn. The same principle that governs a model also governs a build: most of the work is finding the one place where a deliberate human decision changes everything, and automating everything around it.

SPEC-DRIVENVERIFICATION GATESISOLATED WORKTREESTICK-BASED AGENTSHUMAN-AT-THE-GATES

05 // Related

This research is the method, in practice.

REF: LINK-BUILD

// CAPABILITY

The Build method

This is the engine behind how we ship: spec-driven autonomous builds and parallel agent orchestration, with the human at the checkpoints.

How we buildarrow_forward

REF: LINK-WORK

// PROOF

The work it produced

Document-intelligence layers, multi-tenant platforms, and 30-day MVPs, all shipped on this discipline and anonymized in the selected-work record.

Selected workarrow_forward

REF: LINK-PRODUCT

// PRODUCT

Autonomous content-ops

The same state-tracked, multi-agent discipline runs our cron-driven content pipelines: fact-checking between agents, published on a schedule.

See the productsarrow_forward

Talk through a buildarrow_forward Back to research