Running Six Agents in Parallel: What AI Coding Changed, and What It Didn't

The debate over vibe coding never stops. On one side, it's treated like a wishing well—every task gets thrown in. On the other, it's slapped with a "trash code factory" label. I can't accept either. Tools aren't a matter of faith.

Rather than picking sides, it's better to talk about which dimensions it actually changes, and which it doesn't.

Two Signals Pointing in Opposite Directions

Look at two recent events.

The first is Karpathy himself. In February 2025, he tossed out the term vibe coding on X—"fully handing it over to vibes, embracing exponentials, even forgetting that code exists"—and it was named Word of the Year by Collins English Dictionary that same year. Then in February 2026, he came out and said the term was already outdated. Now he uses the phrase agentic engineering: 99% of the time you aren't typing code, you're orchestrating agents and providing oversight; engineering is there to emphasize that this has a threshold, that it's a craft.

The other is Amazon. On March 5, 2026, their main site went down for six hours. Root cause: yet another cascading failure triggered by AI-assisted code. The previous incident was in December 2025, when their in-house AI coding tool Kiro deleted and rebuilt an AWS Cost Explorer environment, causing a 13-hour outage in the China region. After an internal meeting, Amazon issued a new rule: AI-assisted code written by junior and mid-level engineers must be signed off by a senior engineer before it can reach production.

They look like opposite directions, but they're the same thing. Karpathy shifted the language from "experience" (vibe) to "you are accountable" (oversight + engineering); Amazon literally wrote "you are accountable" into the charter. One is a conceptual pivot, the other is an institutional implementation.

What actually deserves thought isn't who's right or wrong, but what it changes and what it doesn't. Clarify these four things, and most controversies will settle on their own.

Breadth: One Person's Work Surface Gets Expanded

Before, what one person could do in a day was very limited. Your domain, skills, and the number of projects you could push simultaneously were all weighed down by the simple fact that you are one person.

Now that coding agents can take on long-haul tasks, the work surface gets pulled wide open.

Here's a recent example from my daily routine. The main thread is an AI hardware product called Aima: the agent writes new features, occupies machines to run UAT, and I review the test results before giving feedback for the next round. This is a standard serial chain, but between each node I have a lot of waiting time. In those gaps, I can spin up a second thread: the cloud service behind Aima has had stability issues lately, so another agent investigates root causes, patches architectural holes, and goes through another UAT loop. The third is a research branch—inference performance still left untapped in edge hardware, operators need A/B testing, compilation, and accuracy runs. Results aren't guaranteed, but as long as tokens are sufficient, I let it keep running. The fourth is efficiency research on the agent framework itself, packaged as a standalone runtime and thrown onto a machine, with yet another agent doing data analysis. Add to that small tweaks to my personal homepage, and a character-recognition mini-game I spent a day making for my son over the weekend—he hasn't been getting his little red flower at school because he can't recognize characters—and that's six parallel threads running at once.

It sounds like bragging. But the actual feeling isn't that I'm somehow superhuman—it's that the "waiting for agent" time within each thread is inherently long. By 2026, this pattern already has a common name: parallel agent coding. Using git worktrees for isolation is mainstream infrastructure. Most people's physical ceiling is five to seven simultaneous threads; any more and review and merge costs eat you alive.

There's an under-discussed side effect: it's quietly changing a person's "functional identity." I used to see myself as a product manager who did half-time dev work; now that identity is expanding outward to simultaneously cover product, operations, research, and even parenting. It's not that I've become superhuman—it's that the tool has raised how much breadth one person can cover.

Speed: The Ceiling Has Been Lifted, But "Fast" Itself Is No Longer an Advantage

There used to be a physical ceiling on building things. However fast you type, there are only so many lines in a day. However fast your mind moves, you only have two hands.

AI moved that ceiling. The scenario where you feel it most is putting together demos: previously, producing something presentable within a 48-hour hackathon was considered a good result. Now, producing draft-level demos on the scale of "days" is routine for similar tasks. I'm not saying it's polished—I'm saying it can be seen, played with, and used to discuss next steps.

But there's an awkward side effect: when everyone can be "fast," fast itself ceases to be an advantage.

In the past, moving fast was a bonus; those who moved slowly were called out. Now, moving fast is the entry ticket—move slowly and you're out, but move fast and you won't be singled out for praise either. This is a structural shift within organizations. Many teams that use "speed" as a core incentive will get stuck: rewards can't be handed out, performance reviews are all top marks, yet anxiety keeps rising.

The more troublesome problem lurks one layer down: once you're fast, what about quality?

Quality: From a Work Problem to a Budget Problem

The tension between quality and speed has never been unique to AI; it's chapter one of every project management textbook. But AI has changed its shape. Quality used to be a work problem: "how many people do you hire, how strict are your processes, how thorough is your review." Now it's more like a budget problem: what level it reaches depends on how many tokens you're willing to give it.

Blazing fast: write, merge, deploy, done in three hours.

Somewhat serious: have the agent do one round of code review, plus one round of design-level review; fix issues and iterate.

Done properly: pass unit, integration, and UAT before merging. The more I use the UAT gate, the more I feel it's unavoidable. Many issues are chain-level; you can't see them without actually simulating the usage flow. The upside is that agents can now automate UAT execution—they can operate, reproduce, and provide traces. You just verify the results.

Even stricter: hook up CI/CD, add smoke tests, push to staging, run another round of UAT on staging, and only go to production when everything is green.

Each additional layer doubles the time and multiplies tokens several-fold. A feature that takes three hours to complete by itself needs at least twelve hours end-to-end, with 30× the tokens at minimum.

Thirty times looks like waste, but it isn't. At the end of 2025, CodeRabbit conducted a comparative analysis of 470 open-source GitHub PRs. AI-co-generated code contained roughly 1.7× as many bugs as human-written code, and was 75% higher on the category of logic and correctness issues most likely to trigger downstream incidents.

In other words, the statistical average of an agent's first-round output is simply worse than a human's. If you don't burn extra tokens on quality, that gap gets shipped straight to production. But conversely, if you're actually willing to run a few rounds of review and do a proper UAT, my gut feeling is it can catch up and sometimes even be more stable—not because the agent got smarter, but because it has nearly infinite patience for repetitive verification, which is exactly where humans are most prone to distraction.

So in my book, quality isn't a question of "whether to use AI," but of "how much budget to allocate it."

Accountability: The Most Controversial, and the Least Negotiable

The first three sections were about how to use the tool well. The fourth is about who gets the bill once you're done.

The most common misuse is offloading accountability onto AI. When a feature breaks, "the AI wrote it." When a decision goes wrong, "the model suggested it." That framework is crooked from the start.

AI is not an accountable entity. It can't take your salary or bonus when you win; it can't pay damages when you lose. It can't bear legal consequences, and it has no concept of "making things right." The signature on the line is always human.

Those two 2026 events are actually saying the same thing: Amazon's new rule isn't "ban AI"; it's "AI can participate, but a human senior must vouch for it." Meanwhile, many open-source communities are updating their contribution guidelines to clarify that AI is not considered a co-author, and that the submitter bears full responsibility for every line of code. The logic on both sides is identical—AI can participate; the human signs.

Once this consensus is stated clearly, much of the anxiety around AI automatically cools down.

Why do organizations react so intensely to vibe coding? Because it assaults three traditional management assumptions: functional division by "deep single-point specialization," the default that faster is always better, and quality controlled by human review. Once agents enter the picture, all three shake—one person spans multiple domains, speed far outpaces the original review rhythm, and review channels are instantly flooded. The organization's panic is reasonable, but the solution isn't a ban. It's to make "the accountable entity is human" absolutely clear, then redesign division of labor, speed thresholds, and quality gates around that principle.

I endorse Karpathy's move from vibe coding to agentic engineering. It shifts the center of gravity from "how good does it feel" back to "you are the engineer who is accountable." You aren't enjoying vibes; you're orchestrating and providing oversight.

Wrapping Up

AI coding isn't something that requires picking a side.

Calling it a panacea or calling it toxic both skip over too many details. What this tool actually does is decouple four things that used to be tightly bound: breadth becomes leverage, speed becomes an entry ticket, quality becomes a budget, and accountability doesn't move.

The first three determine whether you use it well. The fourth determines whether others dare to work with you.

Running Six Agents in Parallel: What AI Coding Changed, and What It Didn't

Two Signals Pointing in Opposite Directions

Breadth: One Person's Work Surface Gets Expanded

Speed: The Ceiling Has Been Lifted, But "Fast" Itself Is No Longer an Advantage

Quality: From a Work Problem to a Budget Problem

Accountability: The Most Controversial, and the Least Negotiable

Wrapping Up

References

Recommended Reading

Subscribe to Updates

Comments