Failing Faster

I've talked a lot about the changes AI has brought to web coding, mostly highlighting its strengths. Today, I'll shift perspective and document a few failures.

Can't Fix Itself

Claude Code has a Chrome browser extension that controls the browser via the MCP protocol, letting it search, click buttons, and take screenshots directly in the browser.

The extension worked fine at first, then one day it suddenly stopped opening.

I still had fairly high trust in its capabilities at the time. After all, it was their own extension connecting to their own tool—a purely software issue that should be resolved quickly. I told it: figure out what's wrong and fix it yourself.

It ended up taking three or four hours of back-and-forth.

The process was quite interesting. Opus 4.6, effort set to high, letting it find the root cause on its own. Each time it would analyze extensively, say it had "discovered a crucial clue," change some configuration, and finally say, "Restart your session and it should work." I'd restart. It wouldn't work. Another round of analysis, more changes, another promise that a restart would fix it. Still nothing.

Eventually it even had me check the Chrome extension console logs and copy the error messages back to it. Once it started asking me for information, the direction went off track. It kept demanding more debugging data, but every round concluded with "just restart."

This loop repeated many times. By the end it was almost comical.

Finally, I reminded it: go search online and see if anyone else has run into this. It searched around, found an issue someone else had filed on GitHub, followed the solution there, and had it fixed in 15 minutes.

I later learned that the extension's architecture is a multi-hop connection chain: CLI → WebSocket → bridge.claudeusercontent.com → native messaging host → Chrome extension. If any hop breaks, the connection fails, and when it does there are no clear error messages. If you also have Claude Desktop installed, the two programs fight over the same native messaging host. It's a known issue in the claude-code repo on GitHub. This experience aligns with my earlier realization from writing 300,000 lines of code in 10 days and then deleting it all—code is a liability. Thinking things through beforehand is far cheaper than deleting them afterward.

A plugin from the same company, connecting to its own tool—supposedly the most familiar territory possible. Three or four hours of self-diagnosis, and it was solved in 15 minutes by simply "going and searching."

The Xiaohongshu Experiment

The second example is Xiaohongshu (also known as RedNote, a Chinese lifestyle platform).

I had accumulated a fair amount of blog content and wondered if I could convert it into traffic on Xiaohongshu, so people who don't know me could see it too.

I started with the most direct approach: posting blog articles there. The results were approximately zero.

Then I had Claude Code design an experimental strategy for me. The plan itself was well done—content adaptation, hashtag strategy, everything was more systematic than what I would have done myself. But the results were still poor.

After troubleshooting, it turned out everything was being soft-throttled.

Xiaohongshu is aggressive with new-account controls. It doesn't tell you directly that your content violated rules; it just quietly withholds traffic. You can't find the posts via search, and they don't appear in recommendations. You publish a post, everything looks normal, but nobody sees it. You can't tell whether the content is bad or whether you've been throttled—that's the frustrating part.

I later checked and found that in April 2025 alone, Xiaohongshu processed 1 million accounts that violated its rules. The platform requires originality above 60%, and notes under 600 characters have their visibility suppressed. For AI-generated content distributed in bulk, NLP-level semantic understanding makes synonym substitution and homophones basically useless. The registration barrier is low, so the probability of being intercepted during the account-building phase is very high.

After several rounds of experiments, I dropped it. This isn't solvable for now.

But going through this process reinforced my belief that my initial choice was right—consolidate on your own website first, then distribute to platforms. If I had started creating content on Xiaohongshu, facing opaque algorithms and rules that can change at any time, a new account with no positive feedback would likely lead to burnout. A personal site doesn't have these constraints; you can write whatever you want. From time to time friends say "this is pretty interesting," and occasionally a post I put on Zhihu (a Chinese Q&A platform) gets decent engagement with lots of discussions and bookmarks. That feedback is organic. The asset of a blog is ideas. Having content first and then finding distribution is far healthier than depending on a platform first and then figuring out content.

Failure Is the Main Theme

Neither of these examples is a big deal. The first fell short of expectations but was eventually resolved. The second fell short and was unresolvable.

But I don't think this means we should be disappointed with AI programming, nor that we should look down on it and stop using it.

In day-to-day work, failure has always been the main theme of getting things done. Think about the entire work process: moments of real achievement are actually rare. Most of the time you're hitting walls and adjusting direction. Accumulating experience through massive failure and exploring in the process—that's the norm.

It's like gaming. Who starts out as a top-tier player? Everyone begins as a noob getting wrecked, then gradually learns and improves through failure.

Working with models doesn't bypass this process. CodeRabbit's report this year says AI-generated code produces 1.7× as many issues as human-written code. Only about 30% of Copilot suggestions are accepted by developers. There's another interesting survey: developers think AI makes them 20% faster, but when you actually calculate the time spent reviewing and fixing bugs, they're 19% slower.

These numbers don't look great. But from another angle, the value of AI programming might not lie in "success rate" at all.

Software development has the fail-fast principle. Jim Shore once said: failing immediately and visibly sounds like it makes software more fragile, but actually makes it more robust. Eric Ries's The Lean Startup follows the same logic—the Build-Measure-Learn loop is essentially about validating hypotheses as quickly and cheaply as possible, and a validated failure is still a result.

AI programming compresses this loop. Fixing a Chrome extension issue used to take days of searching, learning, and asking for help, and you might still give up. Now it takes three or four hours where you do basically nothing but watch it run, restart it, and eventually it's fixed. The Xiaohongshu experiment might have taken much longer to confirm it was a dead end; now a few rounds give you a clear conclusion.

The surface area of experimentation widens, and the speed of feedback accelerates. What used to take ages to determine now might be compressed to one-tenth of the time.

Models Are Getting Stronger

Another feeling from this period: models are improving faster than I expected.

I started experimenting with AI programming at the end of last December. I used Sonnet 4.5 for one day and couldn't continue with it. Right around then Kimi K2.5 was released at the end of January; I bought a membership and tried it. It felt inferior to Sonnet 4.5 but still usable. Around the same time I tried Codex 5.2—it was a bit clumsy, even worse than K2.5. I didn't have high expectations, but since I had already started experimenting, I figured I might as well try a small project.

I asked others to join me on this small project at the time. The process wasn't smooth either—various issues, lots of back-and-forth. But it was indeed doable. That was surprising.

Then new ideas started emerging. I wanted to make a demo for an exhibition—I had a general idea and thought it shouldn't be too hard. I had K2.5 try it. After two or three days of wrestling with it, it just couldn't get it done. Every time it said "it's done, it works," but it simply wouldn't run.

In early February, Opus 4.6 came out and nailed it in one shot.

It's moments like these when you truly feel the models advancing. It's not a matter of benchmark scores improving by a few points; it's about solving a problem in practice that you simply couldn't get past before.

Later, while traveling, I used GLM-5 and MiniMax 2.5—domestic models are cheaper. Then I used Opus 4.6 to build a small plugin, and again it went back and forth without working, each time saying it was fine, but it failed on testing. I threw the exact same task at GPT-5.4, released in early March. Three hours later it said it was done. I tested it. It actually worked.

All of this happened within three months. The core of vibe coding is right here—the tools are iterating at breakneck speed. Today's weakness might be patched tomorrow.

What does this mean? A problem that's blocking you now might not be a problem next month. Either spend more time letting the model run and retry, working with it. Or just wait—once a new model drops, it might be a completely different story.

Of course, when you run into platform-rule issues like Xiaohongshu, no matter how strong the model is, it can't help you.

It's Not a Wish

If you think AI can shift the main theme from "failure" to "success," you're overthinking it. That's making a wish, not using a tool.

What actually happens is this: failure is still the main theme, but you fail faster and you fail more often.

We live in an era of high uncertainty and intense competition. You can't expect immediate results from everything you do. But if the speed and scope of failure increase, getting through those detours to something valuable also accelerates.

I think that might be what makes it interesting at this stage.