I've talked a lot about the changes AI has brought to web coding before, mostly about its strengths. Today, let's look from a different angle and record a few examples of failures.
Can't Fix Itself
Claude Code has a Chrome browser extension that controls the browser through the MCP protocol, allowing it to search for things, click buttons, and take screenshots directly in the browser.
The extension worked fine at first, but one day it suddenly stopped opening.
At the time, I still had faith in its capabilities. After all, it's the company's own plugin connecting to its own tool—a pure software issue that should be resolved quickly. So I said: Take a look yourself and fix it.
It ended up taking three or four hours.
The process in between was quite interesting. Opus 4.6, effort set to high, letting it find the cause itself. Every time it would analyze extensively, say "discovered a crucial clue," change some configurations, and finally say "restart your session and it should work." Restart, doesn't work. Analyze again, change again, say restart again. Still doesn't work.
Later it even asked me to check the logs in the Chrome extension console and copy the error messages to it. Once it starts asking you for information, the direction goes off track. It kept requesting more debugging data, but every round's conclusion was "restart it."
This cycle repeated many rounds. By the end it was almost comical.
Finally, I reminded it: Go search online and see if others have encountered similar issues. It searched around, found issues others had filed on GitHub, followed the solution approach, and fixed it in 15 minutes.
Later I learned that the extension's architecture is a multi-hop connection chain: CLI → WebSocket → bridge.claudeusercontent.com → native messaging host → Chrome extension. Any broken hop causes connection failure, and when it breaks there's no clear error message. If Claude Desktop is installed at the same time, the two programs fight over the same native messaging host. It's a known issue in the claude-code repository on GitHub.
The company's own plugin, connecting to its own tool—I thought this would be the most familiar territory. Three or four hours of self-diagnosis, and finally "go search" solved it in 15 minutes.
The Xiaohongshu Experiment
I've accumulated quite a bit of blog content and was wondering if I could convert it into traffic on Xiaohongshu (Little Red Book), so people who don't know me could see it too.
First tried the most direct approach: posting the blog articles there. The effect was approximately zero.
Then I had Claude Code help me design an experimental plan. The plan itself was well done, considering content adaptation and hashtag strategy, much more systematic than what I would do myself. But the results were all bad.
Investigation revealed that everything was being softly throttled.
Xiaohongshu's control over new accounts is aggressive. It won't directly tell you "your content violated rules"—it just quietly withholds traffic. Can't find it in search, no recommendations either. You post something, everything looks normal, but nobody sees it. You don't know if the content is bad or if you're being throttled—that's what's annoying.
Later I checked and found that in April 2025 alone, Xiaohongshu processed 1 million penalized accounts. The platform requires originality over 60%, and notes under 600 characters have suppressed visibility. For content batch-generated by AI and then distributed, NLP-level semantic understanding makes word substitution and homophones basically useless. The registration threshold is low, so the probability of being intercepted during the account establishment phase is very high.
After several rounds of experiments, I put it down. This can't be solved for now.
However, going through this process also made me more convinced that my initial choice was right. If I had started creating content on Xiaohongshu from the beginning, facing opaque algorithms and rules that could change at any time, with a new account having no positive feedback, I might have given up halfway through. Personal websites don't have these issues—you can write whatever you want. At different times friends say "this is quite interesting," and occasionally posting an article on Zhihu gets decent feedback with many people discussing and bookmarking. This feedback is natural. The asset of a blog is ideas; having content first then finding distribution is much healthier than depending on the platform first then thinking about content.
Failure Is the Main Theme
Neither of these examples is a big deal. The first fell short of expectations but was finally resolved. The second fell short of expectations and couldn't be resolved.
But I don't think we should be disappointed with AI coding because of this, nor should we look down on it and stop using it.
In daily work, the main theme of people doing things is failure to begin with. Think about the entire work process—moments of achieving results are actually very few; most of the time is spent hitting walls and adjusting direction. Accumulating experience through massive failure, exploring in the process—this is the norm.
Just like gaming. Who starts out as a top player? In the beginning everyone gets beaten, is a noob, and then gradually learns and improves through failure.
Working with models can't avoid this process either. CodeRabbit's report this year says AI-generated code produces 1.7 times more issues than human-written code. Only about 30% of Copilot's suggestions are accepted by developers. There's also an interesting survey saying developers think they're 20% faster using AI, but actual calculations show they're 19% slower because of review and bug fixing.
These numbers don't look great. But from another angle, the value of AI coding might not lie in "success rate."
Software development has a fail-fast principle. Jim Shore said something like: Failing immediately and obviously sounds like it makes software more fragile, but actually makes it more robust. Eric Ries's The Lean Startup follows the same thinking—the Build-Measure-Learn loop is essentially about validating hypotheses with the smallest cost and fastest speed; validating and failing is also a result.
What AI coding does is compress this loop. Before, fixing a Chrome extension issue might take several days of searching, learning, asking for help, and finally maybe giving up. Now it's three or four hours where the human does nothing in between, just watching it run round after round, restarting it, and finally it's fixed. That Xiaohongshu experiment might have taken much longer to realize it was a dead end; now a few rounds of experiments give a conclusion.
The surface of attempts has widened, and the speed of feedback has increased. Before, tinkering with something took a long time to know the result; now it might be shortened to one-tenth of the original time.
Models Are Getting Stronger
Another feeling from this period: models are improving faster than expected.
Started tinkering with AI coding at the end of last December. First used Sonnet 4.5 for a day, couldn't continue using it. Happened that Kimi K2.5 was released at the end of January, bought a membership and tried it—felt not as good as Sonnet 4.5 but usable. Tried Codex 5.2 around the same time, a bit dumb, not as good as K2.5. Didn't have high expectations, just thought since I've started trying this, might as well do a small project to see.
At the time asked others to join this small project too. The process wasn't smooth either, various issues, back and forth for a long time. But it could indeed be done. This was surprising.
Then started having new ideas. At the time wanted to make a demo for an exhibition, had a rough idea, thought it shouldn't be hard. Had K2.5 try it, spent two or three days but couldn't get it to work no matter what, every time saying "it's done it works," but the result just wouldn't run.
Early February Opus 4.6 came out, solved it in one go.
Only at moments like this do you truly feel the model's progress. It's not about benchmark scores improving by a few points—it's about solving problems in actual use that you previously couldn't get past no matter what.
Later during travel used GLM-5, MiniMax 2.5—domestic models are cheaper. Later used Opus 4.6 to make a small plugin, again back and forth not working, every time saying it's done no problem, but one try and it fails. Threw the same task to GPT-5.4 released in early March, three hours later said it was done. One try, really done.
All these things happened within three months.
What does this mean? Problems that are stuck now might not be problems next month. Either spend more time letting the model run and try more, working with it. Or wait a bit—when a new model comes out it might be a different story.
Of course, when encountering platform rule issues like Xiaohongshu, no matter how strong the model is, it can't help you.
Not a Wish
If you think having AI can change "failure as the main theme" to "success as the main theme," you're thinking too much. That's making a wish, not using a tool.
What actually happens is: it's still failure as the main theme, but failing faster and failing more.
We are in an era of high uncertainty and intense competition. It's impossible for everything you do to produce results immediately. But if the speed of failure increases and the scope widens, crossing these detours to reach valuable places will also be faster.
I think this might be what makes it interesting at this stage.
