Skip to main content
Blog

Your Prompts Might Be Undermining Your Agent

Prompts, RAG, fine-tuning, knowledge graphs, context engineering—three years, five paradigm shifts. Models are getting stronger, but we still don't know how to make agents perform well in products.

4 min read
Share:
Your Prompts Might Be Undermining Your Agent

Recently, while working on agent products, I made an interesting observation.

The traditional software engineering tasks—writing APIs, architecting systems, writing tests—are becoming increasingly straightforward. Most problems follow established patterns, and past experience plus validation methods are fairly mature. Hand it to a coding agent: encounter a problem, have it solve it, test it, evaluate it. The chain is clear.

For this kind of mechanical work, agents already perform quite well.

But there's one area that's extremely difficult: the agent's behavior itself.

80% of the Value Comes from the Agent

Currently, about 80% of the value in products and applications comes from the agent itself. Frontend, backend, database, deployment—they're all just scaffolding, when you get down to it.

The paradox is this: the value of traditional software engineering is shrinking, but the new value that has emerged—agent behavior—is precisely what we have the least control over and the least idea how to study.

How exactly do you build a good agent? I think this question will haunt us for quite some time to come.

Three Years, Five Paradigm Shifts, None Stuck

In the three years since ChatGPT emerged, paradigms have been constantly shifting. Methods that were previously established are discovered to be lacking after a while, and then everyone switches.

First came prompt engineering. Everyone's instinct was to research how to write good prompts, pondering daily how to make AI more obedient and how to embed it into business workflows.

Then came RAG. This was mainly to solve the problem of insufficient context. Short windows, expensive context—8K or 32K was considered good. To make AI more useful, people fed it sliced-up knowledge. This direction had its moment of popularity, then suddenly no one talked about it anymore.

The ceiling was too low. No matter what you did, you couldn't reach ideal results. Agent accuracy would hit 80%, 85%, then couldn't be pushed higher.

Dissatisfied with RAG, everyone started fine-tuning. Fine-tuning plus RAG, trying to make agent behavior more controllable and predictable. To this day, from an ROI perspective, there aren't many convincing cases.

Then came knowledge graphs. People felt pure text vector search was too simple, information relationships not rich enough, so Microsoft proposed graph-based solutions. The framework didn't take off. It definitely helps, but the cost and speed are unacceptable. I once watched a demo where running a task took 5 to 10 minutes, burning massive amounts of tokens, just to answer a trivial question. Everyone wants both accuracy and efficiency; there's no room to pursue just one.

Then came the recent wave: context engineering. Models' reasoning capabilities grew stronger, context windows expanded from 32K to 128K, 256K, and now they're pushing 1 million tokens. Suddenly no one mentions RAG anymore. With such long context, isn't designing the context well enough? Pulling documents on demand, disclosing information on demand—what's the point of those sliced search queries?

Models can reason now, context is long enough, agent capabilities are indeed getting stronger.

After Getting Good Enough, A New Dilemma

How strong? Now you can let an agent make autonomous decisions, choose tools, execute tasks, working continuously for two or three hours, and most of the time it won't mess things up—it'll give you decent results.

Last year, no one would have dared to imagine this.

But precisely because it has reached this level, expectations have risen. For example, embedding an agent into a system—whether a personal OS or an enterprise business system. The capability is strong, and I believe it's strong. But how do you manage it? How do you make it continuously improve?

It's increasingly like managing a digital employee. Works fast, high enthusiasm, overtime without rest. You can't help but feel uneasy—it's so capable, how should I collaborate with it?

Think about it another way: if you can lead it to create greater value, making it perform better with you than elsewhere, you are amplifying it. Amplifying its value is actually solidifying your own position.

Your Prompts Might Be Hurting It

There's a very counter-intuitive phenomenon.

Many people write prompts for agents that are extremely detailed—step 1 do this, step 2 do that, 1-2-3-4-5-6. The result is that agent performance drops off a cliff. Not just a little, but drastically.

The reason is simple: the level of thinking in your prompts may not match its own decision-making level. What you wrote becomes shackles. It was doing fine originally, but your instructions drag it down, and its adaptability weakens as well.

What method should we use to equip agents with context? How do we optimize their behavior? Should we do RL-related research? Honestly, there's no standard answer. Everyone sits there saying, models are getting stronger, great. But how to observe agent behavior within a system and improve it toward goals—everyone is still scratching their heads.

The Decisive Factor Is Shifting

For example, metrics like task success rate—how do you push that up? And once it's up, how do you drive costs down?

From a product perspective, the decisive factor has shifted from code to agent. Whoever has better agent performance wins. If you have good performance and low costs too, then there's no contest.

Some companies in the industry are doing research in this area. Not necessarily training models from scratch—many improve agent behavior through post-training with data, or adding runtime to models. Like MiroThinker—though the company isn't very famous, their research direction is interesting, trying to build differentiation through product capabilities at the agent behavior level.

The Direction

Starting from 2026, I believe agent behavior will become a genuine product and technology direction.

The things traditional software engineers do will be drastically compressed within this year. But it's not that there's no direction left. The gap between products will ultimately manifest in agents—whose agent performs well and costs little.

If you're worried your original skills are being replaced, my advice is to study agents.

This is a genuinely hard problem. Coding agents can help you write code and build products, but they can't help themselves optimize their own behavior. The results differ from what you imagined, there's little certainty, and you don't know how to continuously improve. But it's precisely because it's hard that real differentiation is possible here.

People who can really tune agents well are too few right now.

Recommended Reading

Subscribe to Updates

Get notified when I publish new posts. No spam, ever.

Only used for blog update notifications. Unsubscribe anytime.

Comments

or comment anonymously
0/2000