I've noticed some interesting patterns in recent development work, and plenty is happening in the outside world as well. This era hasn't slowed down—it's still advancing at an exaggerated pace. Here are a few scattered thoughts.
Coding Agents Are Better at Debugging Than Writing Code
Many posts and reports discuss a question: what's the biggest problem when using coding agents to write code? The answer is bugs.
But from hands-on experience, I think this gets it backwards.
Current coding agents actually perform better at debugging and troubleshooting than at writing code. The reason isn't complicated: debugging has clear objectives, is usually reproducible, and can be broken down step by step for verification. This is work AI handles quite smoothly, and much faster than humans.
The real challenge is asking it to implement something complete from scratch—especially when you're trying to build a product.
The Deep Waters of Productization
Traditional software development has so many processes—unit testing, integration testing, stress testing, canary releases, Alpha, Beta—not because people love bureaucracy, but because once software faces real users, it exposes problems you couldn't anticipate in the code. Best practices reduce the probability of issues, but can't eliminate them entirely. Only time and production pressure can force problems to surface and be resolved one by one.
This challenge applies to coding agents as well.
Building a small component from scratch is fast. Prototyping feels great. But as the product matures and the codebase grows, problems emerge: the larger the project, the easier it is for AI to break things, and the cost of context understanding visibly increases. This follows the same logic as humans maintaining large projects—iteration difficulty naturally increases after a product ships, requiring team division to manage. AI agents can't escape this rule either.
So I think the current state is this: concept validation is fast and satisfying. But turning a concept into a product requires deep thinking and verification at every step—you can't skip any of it.
Creativity-Driven Open Source
But there's one category where AI genuinely excels.
Recently, Milla Jovovich—star of The Fifth Element and the Resident Evil series—spent several months working with engineer Ben Sigman to build MemPalace, an open-source AI memory system using Claude Code. Pushed to GitHub on April 5th, it hit 7000+ stars within 48 hours, now exceeding 22,000.
On the LongMemEval benchmark, MemPalace achieved 96.6% R@5, far exceeding paid solutions like Mem0 and Zep at around 85%. It runs entirely locally, uses ChromaDB + SQLite, MIT license, completely free.
AI memory is indeed a major focus this year. But MemPalace's strength isn't complexity—quite the opposite. It doesn't win through complexity, but through creativity. It focuses on a single target, like hitting a benchmark, and figures out how to do it well.
This model is particularly suitable for AI assistance. The more focused the problem, the more it relies on ideas rather than engineering effort, the faster AI helps you validate. There are more and more such projects in the open-source world, which I find to be a fascinating direction.
No Software Is Secure
The biggest news these past two days is Anthropic's official announcement of Project Glasswing.
Their next-generation model, internally codenamed Mythos, was prematurely exposed due to an internal data leak at the end of March (a CMS configuration issue accidentally exposed roughly 3000 internal documents), with official confirmation on April 7th. This model's capabilities in software security have reached a level they don't even dare to release.
Previous models could find vulnerabilities—this is old news in the industry. But converting vulnerabilities into usable attack methods is a completely different matter. Mythos combines these two steps.
Anthropic's disclosed data is alarming: Mythos discovered thousands of zero-day vulnerabilities across all major operating systems and browsers, including a bug hidden in OpenBSD for 27 years. Vulnerabilities that might have gone undiscovered for decades are now surfaced by a model, and can be directly weaponized into attack tools.
Basically, no software is safe in front of this model.
Anthropic's assessment is that this model cannot be publicly released. They contacted roughly 45 companies—including Apple, Google, Microsoft, Nvidia, AWS, plus CrowdStrike, Palo Alto Networks, Cisco, Linux Foundation, and others—allowing them early access to Mythos to harden their systems. The logic is straightforward: before it becomes a spear, let it serve as a shield.
OpenAI isn't having an easy time either. GPT-5.4 became the first general-purpose model rated as "High Cybersecurity Risk" by OpenAI's own Preparedness Framework. From GPT-5 to GPT-5.4, the model's score on CTF (Capture The Flag) competitions jumped from 27% to 76%. OpenAI chose to add a layer of safety protections and release anyway—a different approach from Anthropic's, but facing the same problem: model attack capabilities are growing exponentially.
I suspected this was happening. With Mythos's release, it's basically confirmed. And this isn't just about software—when something in a new dimension develops at speeds completely beyond expectations, many supporting structures fall out of alignment. Regulations can't keep up, organizations can't keep up, security systems can't keep up.
Build the Framework First
These events have also influenced my thinking about product development.
We've been discussing internally whether some product positioning is too aggressive—for example, designs that let AI fully autonomously manage certain processes. If the models aren't smart enough yet and always require human intervention, then this design doesn't hold up at present.
But from another angle, perhaps product design should run slightly ahead of the models.
This is Anthropic's approach to building products. They build Chrome extensions, Excel plugins internally—start with an idea, set up the scaffolding, then throw each new model generation at it to test what it can do. Wait, wait, and one day realize it's almost there, then invest heavily in productization and release.
If you design products based on current model capabilities, they'll likely be obsolete by launch. Instead, it's better to be slightly more aggressive: think through the architecture first, wait for the engine to arrive, and the whole thing naturally comes together. Think of it, build it, then wait.
The Game Continues
One final piece of good news.
Zhipu's GLM-5.1 officially went open source in early April, MIT license, weights fully public. It scored 58.4 on SWE-Bench Pro, surpassing GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. And they simultaneously raised prices by 10%—while the entire industry is in a price war, raising prices against the trend makes this move itself quite interesting.
In the open-source game, no one has retreated yet. Good to see.
