I Let a Team of AI Agents Write 30,000 Words While I Watched

There is a specific kind of frustration that comes from wanting something done at scale and knowing it will take forever if you do it yourself. I ran into that wall while building out the concepts section of this site.

The agent orchestration as I imagine it looked on the other side of the terminal. Ten subagents, six worker batches, two reviewers, one remediation pass.

The concepts section is supposed to be a reference for advanced AI and deep learning ideas, written so that a curious non-expert can actually follow what is going on. I had about 111 concept files. Five of them were done properly. The remaining 106 were stubs: title, tags, maybe a paragraph. Not enough to teach anyone anything.

The obvious move was to write them out one by one. The problem: 106 files at 300 words each is 31,800 words minimum. That is a short book. I was not going to write a short book manually, and I was not going to pay for a ghostwriter to produce AI explainers. So I asked the CLI.

The first attempt

I started by asking the agent to expand the concepts. The first version it produced was technically correct but read like a Wikipedia stub. Dense, impersonal, no analogies. I pushed back.

The second version was better, more casual, more analogies. But still shallow. Each file was maybe 80 to 120 words. I wanted at least 300, with something that actually teaches rather than just defines.

That is when the agent flagged something I had not considered.

The agent's own diagnosis

"Expanding all 106 concepts to be at least 300 words each means writing over 30,000 words. That is essentially writing a short textbook. While I can certainly start doing this sequentially in smaller batches, doing it one by one will take a very long time. For a massive writing task like this, you can use the /teamwork-preview command."

I had not used /teamwork-preview before. I knew it existed. I had not had a reason to reach for it. This was the reason.

What /teamwork-preview actually is

The short version: it spins up a coordinated team of autonomous subagents, each with its own context window and task scope, all reporting back to an orchestrator that manages dependencies, validates outputs, and retries failed work.

The longer version is more interesting. Instead of one agent grinding through a sequential list and burning through its context window, you get a system that looks like a small software team. An orchestrator figures out the scope of the work, slices it into batches, hands each batch to a worker, and then brings in reviewers once the workers are done. If reviewers find problems, a remediation agent gets spawned to fix just the failing subset. The orchestrator validates the final output and only signs off when everything passes.

It is not magic. The agents are just LLMs. But structuring the work this way means each agent has a focused, completable scope, no agent's context gets polluted by unrelated work, failed subtasks get isolated and retried without affecting the rest, and the whole thing runs in parallel rather than sequentially.

The wall-clock time for a 106-file expansion job dropped from "days if I did it manually" to "a session I could watch in real time."

/teamwork-preview -- agent orchestration

OrchestratorIDLE

Coordinates workers, validates, retries

Worker B1IDLE

Batch 1 — 19 files (Flash to GQA)

Worker B2IDLE

Batch 2 — 18 files (HHSW to LoRA)

Worker B3IDLE

Batch 3 — 18 files (MoE to PPO)

Worker B4IDLE

Batch 4 — 18 files (QAT to SAE)

Worker B5IDLE

Batch 5 — 17 files (Self to System)

Worker B6IDLE

Batch 6 — 16 files (Task to Zero)

Reviewer 1IDLE

Word count + syntax validation (A-M)

Reviewer 2IDLE

Word count + syntax validation (N-Z)

RemediationIDLE

Rewrites underperforming files, re-validates

ORCHESTRATOR LOG

Press RUN SIMULATION to start

TOTAL AGENTS

FILES EXPANDED

106

WORDS WRITTEN

30,247+

QA PASS RATE

92.5%

Launching the team

The /teamwork-preview flow starts with a prompt-crafting phase. The CLI asks structured questions: what is the scope, what counts as done, how strict should validation be. This part felt like talking to a technical lead who was trying to prevent scope creep before the engineers started coding.

I told it the task: expand 106 MDX files, each to at least 300 words, written for a layman audience with real-world analogies, not just definitions.

The CLI pushed back on one thing: verification. How do we know a file actually meets the 300-word requirement? I said to let the agents use their own judgment. The CLI flagged that self-certification is a common failure mode, where an agent marks its own work done without actually meeting the bar. I left it loose anyway and told it to proceed.

It turned out the CLI was right to flag that. More on that shortly.

What happened once the team launched

The first progress report came back 8 minutes in.

The orchestrator had already:

Scanned all 111 files and isolated the 106 stubs
Written a programmatic validation script (scripts/validate-mdx.js) that checks word count and MDX syntax before marking a file complete
Sliced the 106 files into 6 alphabetical batches
Spawned 6 parallel workers, one per batch

The validation script was something I had not asked for explicitly. The orchestrator built it on its own as infrastructure, a mechanism to prevent agents from self-certifying work that did not meet spec. That was good instinct.

The workers ran simultaneously. Batch 1 (19 files) and Batch 4 (18 files) finished first. Then Batch 2 and 5. Then 3 and 6. Each worker ran the validation script on every file it finished before marking the batch complete. The orchestrator got progress signals throughout.

Total elapsed time from launch to "all 6 batches complete": roughly 25 minutes of wall-clock time, during which I did other things.

The part where the system caught its own mistake

When all six workers reported complete, the orchestrator did not immediately declare success. Instead it spun up two reviewer agents, reviewer_1 handling A through M and reviewer_2 handling N through Z, to do an independent pass over every file.

Reviewer 1 found 8 files in Batch 3 that fell just under the 300-word threshold. The worker had technically passed them through its own check, but the reviewer's independent read disagreed. The orchestrator rejected Batch 3, spawned a dedicated worker_remediation agent to rewrite just those 8 files, and then sent the remediated outputs back through reviewer validation.

Why this matters

The failure-then-remediation loop is the whole point. Without independent review, the worker's self-assessment would have stood. With it, 8 files that were slightly under spec got caught and fixed automatically, with no manual intervention from me.

The remediation agent passed. Reviewer 1 signed off. The orchestrator ran a final global build check across all 111 files and declared success.

Total output: 30,247 words across 106 files, all validated, all meeting the layman-friendly 300-word floor.

What worked and what I would do differently

The part that worked almost perfectly was parallelism. Six workers on six batches is meaningfully faster than one agent doing 106 files in sequence. The orchestrator also handled inter-agent communication gracefully, with workers handing off to reviewers without any manual intervention.

The validation script was a genuine win. Having a programmatic check rather than relying on the agent's own sense of "this is long enough" was exactly right. The reviewer agents caught what the workers missed.

What I would do differently: I would be more explicit about verification upfront. The CLI asked me how strict to be and I said to let the agents decide. That was fine in the end because the team built its own validation infrastructure, but it was also a bit of a gamble. The orchestrator made the right call; it might not always. Next time I would specify programmatic verification as a hard requirement from the start.

I would also set explicit word-count minimums in the prompt itself rather than relying on the agents to infer it from my phrasing. "At least 300 words" is clear, but I could have been more explicit about what counts as words and whether frontmatter counts toward the total.

The broader point about agentic work

There is a version of this workflow that sounds impressive but is actually just a marketing demo. Agents that look busy, produce shallow output, and self-certify before anyone checks. I have seen that pattern plenty.

What made this feel different was the adversarial review structure. The orchestrator did not trust the workers. The reviewers did not trust the orchestrator's word that batches were complete. The remediation agent existed specifically because the system assumed some work would need to be redone.

That structure, build, verify independently, remediate, verify again, is just good engineering practice applied to agent teams. The agents were not smarter than a single capable model. The system design was what made the output reliable.

The concepts section now has 111 properly written entries. I did not write 30,000 words. I described what I wanted, answered a few clarifying questions, watched a terminal for 25 minutes, and got a validated knowledge base at the end.

That is a genuinely different relationship to scale than I had last month.

Try it yourself

If you have a large, parallelizable writing or coding task and you have access to Antigravity CLI, type /teamwork-preview into the chat and let it walk you through the prompt-crafting phase before it launches. The questions it asks are worth answering carefully. The clearer you are about what "done" looks like, the better the team performs.

The concepts section is live. Browse it if you want to see what 10 autonomous agents writing in parallel actually produces.

The agent orchestration as I imagine it looked on the other side of the terminal. Ten subagents, six worker batches, two reviewers, one remediation pass.

The first attempt

I started by asking the agent to expand the concepts. The first version it produced was technically correct but read like a Wikipedia stub. Dense, impersonal, no analogies. I pushed back.

That is when the agent flagged something I had not considered.

The agent's own diagnosis

I had not used /teamwork-preview before. I knew it existed. I had not had a reason to reach for it. This was the reason.

What /teamwork-preview actually is

The wall-clock time for a 106-file expansion job dropped from "days if I did it manually" to "a session I could watch in real time."

/teamwork-preview -- agent orchestration

OrchestratorIDLE

Coordinates workers, validates, retries

Worker B1IDLE

Batch 1 — 19 files (Flash to GQA)

Worker B2IDLE

Batch 2 — 18 files (HHSW to LoRA)

Worker B3IDLE

Batch 3 — 18 files (MoE to PPO)

Worker B4IDLE

Batch 4 — 18 files (QAT to SAE)

Worker B5IDLE

Batch 5 — 17 files (Self to System)

Worker B6IDLE

Batch 6 — 16 files (Task to Zero)

Reviewer 1IDLE

Word count + syntax validation (A-M)

Reviewer 2IDLE

Word count + syntax validation (N-Z)

RemediationIDLE

Rewrites underperforming files, re-validates

ORCHESTRATOR LOG

Press RUN SIMULATION to start

TOTAL AGENTS

FILES EXPANDED

106

WORDS WRITTEN

30,247+

QA PASS RATE

92.5%

Launching the team

I told it the task: expand 106 MDX files, each to at least 300 words, written for a layman audience with real-world analogies, not just definitions.

It turned out the CLI was right to flag that. More on that shortly.

What happened once the team launched

The first progress report came back 8 minutes in.

The orchestrator had already:

Scanned all 111 files and isolated the 106 stubs
Written a programmatic validation script (scripts/validate-mdx.js) that checks word count and MDX syntax before marking a file complete
Sliced the 106 files into 6 alphabetical batches
Spawned 6 parallel workers, one per batch

Total elapsed time from launch to "all 6 batches complete": roughly 25 minutes of wall-clock time, during which I did other things.

The part where the system caught its own mistake

Why this matters

The remediation agent passed. Reviewer 1 signed off. The orchestrator ran a final global build check across all 111 files and declared success.

Total output: 30,247 words across 106 files, all validated, all meeting the layman-friendly 300-word floor.

What worked and what I would do differently

The broader point about agentic work

That is a genuinely different relationship to scale than I had last month.

Try it yourself

The concepts section is live. Browse it if you want to see what 10 autonomous agents writing in parallel actually produces.