Skip to main content
Blog
Running Six Agents in Parallel: What AI Coding Changed, and What It Didn't

Running Six Agents in Parallel: What AI Coding Changed, and What It Didn't

Karpathy himself has stopped using the term 'vibe coding.' After an Amazon outage, the company straight-up added a senior sign-off gate for junior engineers' AI-generated code. Starting from my own daily workflow, I'll talk about four things this tool truly changed—breadth, speed, quality, and the one thing that never moved: accountability.

Jiawei GuanJiawei Guan4 min read
Share:

The debate over vibe coding never stops. On one side, it's treated like a wishing well—throw every task into it; on the other, it's slapped with a "trash code factory" label. I can't accept either. Tools aren't a matter of faith.

Rather than picking a side, let's talk about which dimensions it actually changed, and which it didn't.

Two Signals Pointing in Opposite Directions

Look at two recent events.

One is Karpathy himself. In February 2025, he threw out the term "vibe coding" on X—"fully surrender to the vibes, embrace the exponentials, even forget that the code exists"—and it was picked as Collins Dictionary's Word of the Year that same year. Then in February 2026, he himself came out and said the term was outdated. Now he uses "agentic engineering": 99% of the time you're not typing code, you're orchestrating agents and doing oversight; "engineering" is there to emphasize that this has a bar, it's a craft.

The other is Amazon. On March 5, 2026, their main site was down for six hours. Root cause: another cascading failure triggered by AI-assisted code. The previous one was in December 2025, when their in-house AI coding tool Kiro deleted and recreated an AWS Cost Explorer environment, causing a 13-hour outage in China. After an internal meeting, Amazon issued a new rule: AI-assisted code written by junior and mid-level engineers must be signed off by a senior engineer before it can reach production.

They look like opposites, but they're the same thing. Karpathy moved the term from "experience" (vibe) to "you're on the hook" (oversight + engineering). Amazon literally wrote "you're on the hook" into the charter. One is a conceptual pivot; the other is an institutional implementation.

What really deserves thought isn't who's right or wrong, but what changed and what didn't. Clarify these four things, and most of the controversy will quiet down on its own.

Breadth: One Person's Surface Area Gets Stretched

There used to be hard limits on what one person could do in a day. Your domain, your skills, the number of projects you could push at once—all pressed down by the simple fact that you are one person.

Now that coding agents can take on long-horizon tasks, that surface area has been stretched.

Here's a slice of my daily routine over the past few weeks. The main thread is an AI hardware product called Aima: an agent writes new features, occupies machines running UAT, I review the test results and feed the next round of instructions. It's a standard serial chain, but there's a lot of waiting between each node. In the gaps, I can spin up a second thread: the cloud backend behind Aima has had stability issues lately, so another agent investigates root causes, patches architectural holes, and loops back through UAT. Third is a research branch: inference performance still left on the table in edge hardware, operators need A/B testing, compilation, accuracy runs. No guaranteed output, but as long as tokens hold out, let it run. Fourth is efficiency research on the agent framework itself, packaged as a standalone runtime and thrown onto a machine, with another agent doing data analysis. Plus small tweaks to my personal homepage, and a character-recognition mini-game I spent a day and a half building for my son over the weekend—he hasn't been getting his little red flowers at school because he can't read characters yet. That's six threads running in parallel.

Sounds like bragging. But the actual feeling isn't that I'm somehow superhuman; it's that the "waiting for the agent" time within each thread is naturally long. This pattern already has a common name in 2026: parallel agent coding. Git worktrees as isolation layers are mainstream infrastructure. Most people's physical ceiling is five to seven parallel threads; beyond that, review and merge costs eat you alive.

There's an under-discussed side effect: it's quietly changing a person's "functional identity." I used to see myself as a PM who also codes half the time. Now that identity is expanding outward, covering product, operations, research, even parenting. It's not that I became Superman; the tool simply raised the breadth that one person can cover.

Speed: The Ceiling Lifted, But "Fast" Itself Stops Being an Advantage

There used to be a physical ceiling on building things. Type as fast as you want, you only get so many lines per day. Think as fast as you want, you only have two hands.

AI moved that ceiling. The place you feel it most is putting together demos: a hackathon used to be a success if you produced something viewable in 48 hours. Now producing draft-level demos on the scale of "days" is normal for comparable tasks. Not that it's polished—just that it can be seen, played with, and used to discuss next steps.

But there's an awkward side effect: when everyone can be "fast," speed itself stops being an advantage.

In the past, moving fast was a bonus; moving slow got you talked about. Now moving fast is the price of admission, moving slow gets you cut, and moving fast won't earn you special praise anymore. This is a structural shift inside organizations. Teams that use "speed" as a core motivator will freeze up: rewards can't be handed out, performance reviews are all top marks, and anxiety actually rises.

The more troublesome problem lurks one layer down: once you're fast, what about quality?

Quality: From a Work Problem to a Budget Problem

The tension between quality and speed was never AI-specific; it's chapter one of any project management textbook. But AI did change its shape. Quality used to be a work problem: how many people you hire, how strict your process, how fine-toothed your review. Now it's more like a budget problem: how many tokens you're willing to give it determines the level it reaches.

Bare minimum: write, merge, ship. Three hours done.

Somewhat serious: have the agent do a round of code review, then a round of design-level review; fix issues and iterate.

Done properly: unit, integration, and UAT before merge. The more I use UAT, the more I see it's unavoidable. Many issues are chain-level; you can't see them without actually simulating the usage flow. The upside is agents can now automate UAT runs: operate, reproduce, provide traces. You just verify the results.

Even stricter: wire up CI/CD, add smoke tests, push to staging, run UAT again on staging, all green before production.

Each added layer doubles the time and multiplies tokens several-fold. A feature that takes three hours to write might need twelve hours end-to-end, and thirty times the tokens.

Thirty times looks like waste, but it isn't. At the end of 2025, CodeRabbit ran a comparative analysis on 470 open-source GitHub PRs. AI-co-generated code contained roughly 1.7× as many bugs as human code, and on the category of logic and correctness issues most likely to trigger downstream incidents, it was 75% higher.

In other words, the statistical average of an agent's

Recommended Reading

Subscribe to Updates

Get notified when I publish new posts. No spam, ever.

Only used for blog update notifications. Unsubscribe anytime.

Comments

or comment anonymously
0/2000