Skip to main content
Blog
The Prompts You Write Might Be Dragging Down Your Agent

The Prompts You Write Might Be Dragging Down Your Agent

Prompts, RAG, fine-tuning, knowledge graphs, context engineering—five paradigm shifts in three years. Models are getting stronger, but we still don't know how to make agents perform well in products.

Jiawei GuanJiawei Guan4 min read
Share:

Lately, while working on agent products, I've had an interesting realization.

Traditional software engineering tasks—writing APIs, architecting systems, writing tests—are becoming increasingly straightforward. Most problems follow established patterns, and past experience plus validation methods are fairly mature. With a coding agent, you hit a problem, let it handle it, test the result, evaluate it. The loop is clear.

For this kind of mechanical work, agents already perform quite well.

But there's one area that's extremely difficult: the behavior of the agent itself.

80% of the Value Comes from the Agent

Today, roughly 80% of the value in products and applications comes from the agent itself. Frontend, backend, database, deployment—they're all just scaffolding, when you get down to it.

The contradiction is this: the value of traditional software engineering is shrinking, but the new value that's emerging—agent behavior—is exactly the thing we have the least control over and the least idea how to study.

How do you actually build a good agent? I think this question is going to haunt us for a long time to come.

Five Paradigm Shifts in Three Years, None of Them Stuck

In the three years since ChatGPT launched, paradigms have been constantly shifting. Methods that were previously settled on turn out to be lacking after a while, and then everyone moves on.

It started with prompt engineering. Everyone's instinct was to study how to write better prompts, obsessing over how to make AI more obedient and how to embed it into business workflows.

Then came RAG. At the time, it was mainly to solve the problem of insufficient context. Windows were short and context was expensive; 8K or 32K was considered pretty good. To make AI more useful, people fed it knowledge in sliced chunks. This direction had its moment in the spotlight, and then suddenly no one talked about it anymore.

The ceiling was too low. No matter what you did, you couldn't reach ideal performance. Agent accuracy would hit 80%, 85%, and then couldn't be pushed any higher.

Unsatisfied with RAG, everyone moved on to fine-tuning. Fine-tuning plus RAG, trying to make agent behavior more controllable and predictable. To this day, from an ROI perspective, there aren't many convincing cases.

Then came knowledge graphs. People felt that pure text vector search was too simplistic and that information relationships weren't rich enough, so Microsoft proposed a graph-based approach. The framework never really took off. It does help, but the cost and speed are hard to swallow. I once watched a demo where running a single task took 5 to 10 minutes, burned through a massive number of tokens, all to answer a trivial question. Everyone wants both accuracy and efficiency; there's no room to pursue only one side.

Now the latest wave is context engineering. Models' reasoning capabilities have grown stronger, and context windows have expanded from 32K to 128K, 256K, and now they're pushing 1 million tokens. Suddenly, no one mentions RAG anymore. With context this long, isn't it enough to just design the context well? Pull documents on demand, disclose information on demand—what's the point of all those sliced search queries?

Models can reason now, context is long enough, and the overall capabilities of agents are indeed improving.

After Getting Strong Enough, a New Dilemma

How strong? Now you can let an agent make autonomous decisions, choose tools, execute tasks, working continuously for two or three hours, and most of the time it won't mess things up—it'll even give you decent results.

Last year, no one would have dared to imagine this.

But precisely because they've reached this level, expectations have risen. For example, embedding an agent into a system—whether a personal OS or an enterprise business system. The capability is strong, and I believe it's strong. But how do you manage it? How do you make it keep improving?

It's increasingly like managing a digital employee. Works fast, highly enthusiastic, never rests or complains about overtime. You can't help but feel a sense of unease: it's this capable, so how should I collaborate with it?

Look at it another way: if you can guide it to create greater value, making it perform better under your direction than anywhere else, you are amplifying it. Amplifying its value is also solidifying your own position.

Your Prompts Might Be Hurting It

There's a very counter-intuitive phenomenon.

Many people write prompts for agents that are incredibly detailed: step one do this, step two do that, 1-2-3-4-5-6. The result? The agent's performance drops sharply. Not by a little—a cliff-like drop.

The reason is simple: the level of thinking in your prompt may not match the level of its own decision-making. What you write becomes shackles. It was doing fine on its own, but your instructions drag it down, and its ability to adapt weakens as well.

What method should you actually use to equip an agent with context? How do you optimize its behavior? Should you do RL-related research? Honestly, there's no standard answer. Everyone is sitting around saying, the models are getting stronger, great. But how to observe agent behavior within a system and improve it around specific goals—people still have no clue.

Where the Decisive Advantage Lies Is Shifting

For example, take metrics like task success rate. How do you push it up? And after pushing it up, how do you drive costs down?

From a product perspective, the decisive advantage has shifted from code to agent. Whoever has the best-performing agent wins. If it's also low-cost while performing well, then there's nothing left to compete over.

There are companies in the industry doing research in this area. Not necessarily training models from scratch—many improve agent behavior through post-training with data and adding runtime to models. Take MiroThinker, for example: although the company isn't very well-known, their research direction is quite interesting, trying to build differentiation through product strength at the agent behavior level.

The Way Forward

Starting from 2026, I believe agent behavior will become a genuine product and technology direction.

The things traditional software engineers do will be massively compressed within this year. But that doesn't mean there are no directions left. The gap between products will ultimately manifest in the agent—whose agent performs well and costs less.

If you're worried that your original skills are being replaced, my advice is to study agents.

This is a genuinely hard problem. A coding agent can help you write code and build products, but it can't help itself optimize its own behavior. The results don't match what you imagined, there's little certainty, and you don't know how to continuously improve. But it's precisely because it's hard that real differentiation is possible here.

Too few people can tune an agent well right now.

Recommended Reading

Subscribe to Updates

Get notified when I publish new posts. No spam, ever.

Only used for blog update notifications. Unsubscribe anytime.

Comments

or comment anonymously
0/2000