Token Economics Has Shrunk the AI Tech-to-Profit Cycle to One Day

A while back, our team was working on an optimization for the inference engine.

Exactly what we were studying doesn't matter—it was one of those grinding, unglamorous tasks: staring at the profiler for nearly three weeks, squeezing incremental gains out of scheduling and VRAM, and finally improving throughput by a few points.

What can a few points do? Back when I was starting out, almost nothing. But this time was different. One morning we merged it into main and rolled it out gradually; the next day we opened the dashboard and those points had already turned into real money. Same cards, same model, same customers—cost per token dropped by a few points, and gross margin widened by a few points. From the day it went live, it started showing up on the books.

A low-level optimization, from an idea in the head of some kid on the team born after 2000 to profit on the ledger, separated by a single night.

Five years ago, this would have been unthinkable.

Short Chain, Short Lifespan

Lately, whenever I talk to friends, the conversation keeps circling back to the same thought: we've stumbled into an unusual era.

What's unusual? Search history and you'll struggle to find another time when technical leadership turned into commercial influence, competitive advantage, or direct profit this fast. How short? A model company today basically has one job: train a smart enough model, ship it, provision compute for tokens, maybe open-source it. The rest—influence, revenue, valuation—grows on its own, fast.

The catch: the other end of this chain is just as short. Each generation's time in the spotlight is compressed to three to six months, and that's optimistic. Often, a model goes from hot topic to complete silence in one to three months.

How Fast the Wind Has Turned These Past Six Months

Last November, Google dropped Gemini 3 and slaughtered the leaderboards; even OpenAI declared a "Code Red" internally. Back then, open any group chat and the screen was full of Gemini worship. Six months later, the conversation has moved on. Not that Gemini is failing—users are still growing—but the spotlight on this track only stays lit for about three months.

Go back a bit further. When Claude Opus 4.6 dropped, insiders and outsiders alike thought "this is the thing that changes the world." At the time, it truly crushed the competition. Then 4.7, 4.8, and so on came along, praise and criticism trailing right behind.

OpenAI's story is even more dramatic. Its coding capabilities had always been mocked. I used the early Codex myself for a while and then unsubscribed—it was genuinely terrible. Then riding GPT-5.4 and 5.5, Codex felt like it swapped in a new engine: OpenAI's official numbers say weekly active users broke 5 million, a 6x increase since the desktop launch in February. One generation of models dragged a product everyone had written off out of the gutter.

The clearest example in China is Z.ai. A year ago its position was precarious, seemingly about to drop out of the race. Then GLM-4.5, 4.6, and 4.7 came out in quick succession, followed by GLM-5, 5.1, and 5.2 at the start of the year—three versions in three months—and the situation flipped completely. Six months after its Hong Kong IPO, its stock price rose roughly eightfold, market cap surging past 600 billion HKD. The technical reversal was written directly into the stock price.

MiniMax is the opposite case. At IPO its pricing and valuation were in the same ballpark as Z.ai's. Its stock price doubled on day one, market cap briefly hit over $13 billion, and during the March surge it even briefly surpassed Baidu's Hong Kong-listed shares. But the wind turned just as fast: the M2.7 and M3 generations didn't catch the hype, market expectations immediately took a discount, and market cap fell back sharply from its peak. The speed at which they hype you is the same speed at which they abandon you.

After all the noise, attention converged on two things: coding and multimodality. Traditional valuation logic—users, revenue structure, moats—basically fails here. Everyone is really only asking one question: is your current generation of models strong enough.

In My Line of Work, This Chain Used to Be Terrifyingly Long

The thrill of "tech turning into money in days" exists because I know exactly how excruciating it used to be.

I do infra, work tied tight to infrastructure. In the past, if you made a cluster-level innovation that improved inference efficiency by 15%, turning that 15% into a commercial advantage was hard enough to make you quit.

Why so hard? You can't directly price that 15%. You can't tell a customer, "The machine used to cost a million, now it's 15% faster, so I'll sell it to you for 1.15 million." That's not how they calculate. They'll drag you into their TCO model, into their risk structure, haggling: how do you prove it's 15%, could it actually be 5%, who guarantees stability, who bears supply chain volatility.

So a low-level optimization, to reach the point where customers actually pay for it, is separated by long cycles, complex supply chains, and a massive business team. You have to maintain an entire organization, grinding slowly at the far end of the commercial chain, to grind technology into profit and scale. A tweak at the bottom layer takes forever to echo back from the market.

Worse, models have short lifespans. You toil away optimizing for a particular model generation for months, and by the time you're done, its moment in the spotlight has already passed. The investment hasn't broken even, but the target has already disappeared. So inference infra used to be stuck in an awkward spot: everyone believed it would matter down the road, but nobody could see a clear business model right now.

Tokens Are Spot Goods, Not Futures

Because AI coding ignited token demand, squeezing every bit of slack from the chain until it feels almost unreal.

The key is the nature of tokens: they settle spot against daily capacity. Unlike traditional goods—design today, produce tomorrow, ship the day after—tokens are computing right now, results sent out within seconds. This single trait rewrote all the rules.

The few points we earned from those three weeks of research start settling the moment they go live: from the next day, the extra capacity squeezed out of daily production is booked directly as additional gross margin and competitive advantage. No waiting for the next fiscal year, no entering someone's TCO model, no maintaining a team to prove how much it's worth. Capacity grew, real cost per token dropped, and the books looked better that same day.

And it replicates with almost zero friction. This kind of optimization doesn't discriminate by region—same type of compute, same model, same customers. It's basically a straight port, spreading extremely fast.

Measured in days, a low-level technical improvement converts directly into commercial returns, skipping that entire long, heavy system in between. This is probably the shortest chain from technology to profit in history.

This Is the Era of the Young

There's another thing I keep thinking about, and it amuses me more and more: the people doing this on the front lines are mostly kids in their early twenties.

This era is unusually kind to them, because its judging criteria are brutally objective: did efficiency rise or fall, did accuracy change. Put the thing on the table and one test settles it, leaving almost no room for "seniority" or "connections." You don't need any veteran's nod, nor do you need to be good at reading people or playing politics. Whether an industry's senior judges stamp their approval on you doesn't matter here. Make something real, and it writes itself plainly on production efficiency.

A twenty-something can create value worth hundreds of millions, even billions of dollars, with a single technical breakthrough. Fast enough to see results immediately, hard enough to be verified by anyone. The entire chain no longer needs to be stuffed with so many people whose sole job is to "judge whether you're qualified."

In Closing

The era we've caught is global, its tempo terrifyingly short, and it will likely only grow more brutal from here.

But it has also genuinely deleted those long, heavy middle layers, along with those people whose entire job is to judge you. What remains is technological innovation, and the person who creates it.

The fewer people in the middle, the more valuable the ones doing the work.

References

Google, "Introducing Gemini 3", 2025-11-18, https://blog.google/products/gemini/gemini-3/
Fortune, "Sam Altman declares 'Code Red' as Gemini 3 surges", 2025-12-02, https://fortune.com/2025/12/02/sam-altman-declares-code-red-google-gemini-ceo-sundar-pichai/
OpenAI, "Codex is becoming a productivity tool for everyone" (Weekly active users exceed 5 million, 6x growth since February), 2026-06-02, https://openai.com/index/codex-for-knowledge-work/
OpenAI, "Codex for (almost) everything" (3 million weekly active users milestone), 2026-04-16, https://openai.com/index/codex-for-almost-everything/
OpenAI, "ChatGPT — Release Notes" (GPT-5.4 / 5.5 release notes), accessed 2026-06-17, https://help.openai.com/en/articles/6825453-chatgpt-release-notes
Securities Times, "'Global Large Model First Stock' Z.ai Exceeds 57 Billion HKD Market Cap on IPO Day", 2026-01, https://www.stcn.com/article/detail/3580246.html
Finet.com.cn, "[IPO Tracking] Z.ai (02513.HK) High Growth Ignites Hong Kong Stocks, Shares Rise 31% to New High", 2026-04, https://www.finet.com.cn/news/69cc927d2308294c69bf7bec.html
Z.ai, "Z.ai Releases GLM-4.7", PR Newswire, 2025-12-22, https://www.prnewswire.com/news-releases/zai-releases-glm-4-7-designed-for-real-world-development-environments-cementing-itself-as-chinas-openai-302649821.html
MarkTechPost, "Z.ai Launches GLM-5.2 With a Usable 1M-Token Context", 2026-06-14, https://www.marktechpost.com/2026/06/14/z-ai-launches-glm-5-2-with-a-usable-1m-token-context-two-thinking-effort-levels-and-no-benchmarks-at-launch/
Reuters, "MiniMax doubles in value in Hong Kong debut", 2026-01-09, https://www.reuters.com/world/asia-pacific/china-ai-firm-minimax-set-surge-hong-kong-debut-2026-01-09/
Securities Times, "Stock Price Surges 51% in Two Days, MiniMax Market Cap Successively Surpasses Three Internet Giants", 2026-03-11, https://www.stcn.com/article/detail/3670887.html
MiniMax, "MiniMax-M2.5" (SWE-bench Verified 80.2), GitHub, 2026-02, https://github.com/MiniMax-AI/MiniMax-M2.5