Skip to main content
Blog

Emergency Room and the Vanishing Moat

Aima Service refactoring retrospective. Products are like emergency rooms—users don't care about the decor, only whether the doctor can treat them. Ran Claude Code and Codex alternately for over a week, 1.3 million lines of code deployed. Looking back, the moat of code volume may already be gone.

4 min read
Share:
Emergency Room and the Vanishing Moat

I recently refactored Aima Service. It took about a week, and a lot of thoughts came up during the process—jotting them down here.

Emergency Room

Aima Service isn't the kind of tool you open every day. It's more like an emergency room for devices—you don't think about it normally, only when something goes wrong.

This creates an awkward dynamic: success feels like nothing to the user. Problem solved, "yeah, okay," and they're gone. Since they never experienced the pain, they naturally don't realize how significant the solution was.

Failure, on the other hand, is much more interesting.

There are types of failure. The agent tried seriously but couldn't fix it, telling you the task failed—that's one type. Claiming it succeeded but actually didn't when you test it—false success—is another. But the worst is when the channel itself breaks. Crashes, freezes, disconnections mid-process.

The first two are manageable. Like going to the ER where the doctor couldn't cure you—you just go somewhere else. The last one is unacceptable. You called 120 (emergency services), the ambulance arrived, but the ER is closed when you get there. Or you get inside, the registration system is down, the doctor disappears halfway through, and someone comes out to say "sorry, we're closed for today."

That's the real crash.

Decor Doesn't Matter, Doctors Must Be There

Once you understand this, priorities become clear.

In an emergency room, fancy decor is useless. Comfortable sofas are useless. Only two things matter: is the door open, and can the doctor treat patients?

The previous version was functionally adequate but unstable. Tasks often ran into bugs, crashed constantly, and suffered from mysterious freezes. The ER door was open, but the doctor wasn't there.

So I did a complete refactoring. No new features—just rebuilding the foundation.

Two Models Alternating

The refactoring used Claude Code and Codex, alternating between the two.

First, I designed the documentation system, had both models read the docs and code, and list what needed to be done. Then Claude Code ran the first round of refactoring, handed it off to Codex for the second round, and back and forth.

Why not just one? I tried—it tends to drift. Claude Code has strong architectural sense, seeing structural-level issues, but sometimes it's too conservative. Codex moves fast, acts boldly, and circles back to catch details, but occasionally it's too rough. Letting just one do it amplifies its weaknesses. Alternating actually creates the best rhythm—issues found by one are often fixed by the other.

Later I checked and found a NeurIPS 2025 paper titled "Lessons Learned" that specifically analyzed how different LLMs complement each other, concluding that around 3 agents works best, with diminishing returns beyond that. Matches my experience exactly.

Each round lets the models spin up agent teams to run in parallel. A single task taking three or four hours is normal. It's all asynchronous anyway—you do your own thing, occasionally check in and ask a few questions.

CI/CD Ate More Time Than Expected

The functional refactoring was roughly done in about a week. What really consumed time was what came after.

The code changed extensively—what catches things before going live? Automated testing, build checks, integration validation—none can be skipped. One pipeline run takes dozens of minutes; fix something, run again, another few dozen minutes. Later, adjusting test cases and optimizing the pipeline repeatedly—I wrote tens of thousands of lines of test code alone.

When people talk about AI coding, they focus on "how fast." But to actually reach production, CI/CD and testing may account for over half the engineering effort. AI can do this part too, but it takes many rounds and lots of debugging—you can't rush it.

1.3 Million Lines

After refactoring, I generated a report and looked at the numbers.

Functionally, nothing new was added—it looks unremarkable. But the architecture transformed from "running tasks with bugs everywhere" to a design that can handle 100,000-level users. Modules are cleanly split, with distributed workers and overseas federation built in.

The code volume surprised me: approximately 1.3 million lines excluding documentation, 1.7 million with docs. Eight or nine days, one non-specialist, two AI models.

Then I remembered something.

The Moat Dried Up

A few years ago, there was a popular saying: code volume is the moat of software companies. Millions of lines of code piled up—others can't copy it even if they want to.

I remember reading about this back then and thinking it made sense. Cisco IOS XE has 190 million lines of code, maintained by over 3,000 people, releasing 700+ new features annually. SAP's ABAP code exceeds 250 million lines, with fewer and fewer people who can understand it. These companies indeed rely on "this thing is too complex for anyone to replace."

But thinking carefully, code volume was never the entire moat. Cisco's moat is 190 million lines plus ecosystem and switching costs, plus brand. SAP's barrier isn't that ABAP is hard to write—it's that out of 425,000 customers, only 5% migrated to S/4HANA within seven years. Lidl tried, burned €500 million, and gave up. Revlon lost $64 million in sales. The lock-in effect created by complexity is far more stubborn than the code itself.

But now it's getting interesting. Earlier this year, Marek Kowalkiewicz wrote "Drying the Moat," mentioning that after Anthropic demonstrated AI reading and modernizing COBOL systems, IBM lost $40 billion in market cap in a single day. Code complexity actually creates an "understanding asymmetry": you can't read my code, so you can't leave me. AI erased that asymmetry.

Looking back at myself: eight or nine days, one person with two models alternating, and a 1.3 million line system is already running in production. It has distributed architecture, CI/CD pipelines.

1.3 million certainly isn't Cisco's 190 million—two orders of magnitude difference. Ecosystem and customer lock-in aren't replaceable by code. But the "code complexity" leg is already being pulled out. How long the remaining legs can hold, it's hard to say.

Do It When You Think of It

Looking back at the whole process, a few thoughts.

AI can do product-level software. What came out of this is already running in production with active users—not a demo. The hard parts are CI/CD and testing; these take time, but AI can do them too, just needs more iterations.

Refactoring isn't that scary anymore. Before, inheriting a messy legacy codebase, just figuring out what it was doing would take weeks. Now models read it in minutes and can draw architecture diagrams. One week from mess to new architecture, technical debt cleaned up cleaner than manual work.

The biggest change might be mindset. Before, refactoring was a major decision—you'd calculate headcount, timeline, risk. Now, when the codebase can't sustain itself, just rebuild it. New technology emerges? Use it to rebuild from scratch. Code is increasingly becoming a consumable.

Of course code is still the skeleton of the product—that hasn't changed. But the cost of producing that skeleton is no longer on the same order of magnitude as before.

Recommended Reading

Subscribe to Updates

Get notified when I publish new posts. No spam, ever.

Only used for blog update notifications. Unsubscribe anytime.

Comments

or comment anonymously
0/2000