The Endgame for Model Companies Is Cloud Companies

People often ask: how do open-source models make money?

Actually, reframe the question and it becomes easier to understand. How does open-source software make money? Managed cloud services. Redis is open source, Redis Cloud makes the money. MongoDB is open source, Atlas makes the money. This is even more true for models, and they are better suited to this path than open-source software.

The last post discussed how the open-source community has changed over the past three years; this one is about money—how the business behind open-source models actually works.

Two Sides of the Same Coin

Look at the global landscape today: cloud providers are desperately building models, while model companies are desperately buying compute.

Google's 2025 capital expenditure exceeded $90 billion, centered on its self-developed Gemini models and self-developed TPUs. Microsoft, tied to OpenAI, poured$ 80 billion into building AI data centers. Amazon invested over $100 billion to expand AWS compute, while also putting$ 4 billion into Anthropic. The three companies' combined capital expenditure in 2025 alone exceeded $300 billion, with the bulk going to AI.

Now look at the model companies. OpenAI's 2025 revenue exceeded $20 billion, mainly from APIs and subscriptions—fundamentally, selling inference compute. Anthropic signed an agreement with Google Cloud for millions of TPUs worth tens of billions of dollars, while also running over 500,000 Trainium chips on AWS. Are these model companies or cloud companies?

Both sides are turning into the same thing.

How Do Open-Source Models Actually Make Money?

DeepSeek walked us through the playbook.

On January 20, 2025, R1 was released during the Lunar New Year holiday. Six days later, the app hit number one on the US iOS download chart, topping the rankings in 52 countries simultaneously. January saw over 14 million downloads; by April, monthly active users approached 100 million. No paid advertising, no marketing campaigns—customer acquisition cost was essentially zero.

The API pricing was aggressive too. R1 was priced at $0.55 per million input tokens, while OpenAI's comparable o1 was$ 15. That's roughly 3.5% of OpenAI's price—far more extreme than the "one-fifth of OpenAI's price" people had been talking about. Many said this was selling at a loss for publicity, impossible to turn a profit.

At the end of February, DeepSeek held an "Open Source Week," releasing five low-level optimization technologies over five days: FlashMLA, DeepEP, DeepGEMM, DualPipe, and 3FS—covering attention decoding, matrix operations, pipeline parallelism, and distributed file systems, all foundational infrastructure built in-house. DeepGEMM's core code was only 300 lines, yet outperformed expert hand-tuned kernels. Only then did people realize how much work this company had done at the底层.

Then on March 1, DeepSeek released a set of numbers: based on an H800 rental price of $2/hour, the daily inference GPU cost for the V3 and R1 models was about$ 87,000. If all traffic that day were billed at R1's pricing, theoretical daily revenue would be about $562,000. The theoretical cost-profit ratio: 545%.

Of course, that 545% needs to be discounted. DeepSeek itself said—the web app and mobile app are free, V3 is priced lower than R1, and there are off-peak discounts. The actual monetized traffic is far smaller than total traffic. This figure also only counts inference GPU rental fees, excluding training costs, R&D investment, and salaries. The actual total R&D cost for V3 is estimated by the industry to be between $500 million and$ 1.6 billion.

But the 545% itself isn't the point. The point is: with the same open-source model, if someone else ran inference services at this pricing, they would probably lose money. Because DeepSeek has done extensive low-level optimization, at the same price, they make money. Pricing power rests with the originator.

The Flywheel Spins

What's the most painful thing about running a cloud business? Everyone sells more or less the same thing—a thin layer of service on top of bare metal, and gross margins get squeezed fast. AWS's operating margin is roughly 33% to 38%, already the ceiling. Google Cloud went from years of losses to around 30%. Smaller cloud providers have even thinner margins, and customers are highly concentrated; when a big customer pushes for lower prices, you have no leverage. No matter how much you invest in underlying technology, it's hard to translate that into a difference customers can perceive.

Add a model layer and it changes completely. Suppose I operate a large-scale inference cluster and improve model efficiency by 10%—the same hardware now produces 10% more tokens. Take the extra profit and pour it into R&D to further optimize inference efficiency and drive down unit costs. Then you can attack the market with lower prices, attract more users, push utilization higher, and profits grow again. Then invest in R&D again.

The old cloud business didn't have this loop. You could spend a lot on technical improvements, but customers wouldn't notice. Models are different—optimizing inference efficiency is directly money. Either costs are lower, or the same cost yields significantly more output.

In 2025, Google raised its capital expenditure from $75 billion to$ 93 billion, mostly pouring it into AI infrastructure. What they see is this shift: the model layer gives the cloud business real technology leverage.

Open Source Is the Most Efficient Customer Acquisition

Why not just sell it closed-source? Because whether a model is good or not can't be told from benchmark scores alone.

Llama 4 is the cautionary tale. In April 2025, Meta released Llama 4 Maverick, which ranked second on LMArena. It was quickly discovered that the version submitted to the leaderboard was a specially tuned "experimental" version—responses were unusually long, full of emojis, and flashily formatted, all tricks to game the scores. The publicly released standard version was retested and ranked 32nd. Later, when Yann LeCun left Meta, he personally admitted that the "results were fudged." Zuckerberg lost confidence in the entire GenAI team, and the LLaMA series was essentially knocked out of the open-source community.

Benchmarks can be gamed; user experience cannot.

When a model is open-sourced, everyone can run it and test it. Whether it's good or not becomes clear in minutes. This process builds word of mouth and stickiness. Once, when I was helping a friend set up an AI client tool, someone dug up a DeepSeek API account they had registered half a year ago to connect. In that scenario, DeepSeek wasn't actually the optimal choice, but it felt convenient—they had already registered, used it, and built trust. Developers are similar: if you built a project with a certain model's API before, you'll probably use it for the next project too. Switching has costs, and rebuilding trust costs even more.

DeepSeek's playbook combines open-source models with a free app. Developers test the open-source model; regular users test the app. Both build brand awareness simultaneously. A portion naturally converts into paying API users. Customer acquisition cost is near zero, yet the reach is wider than spending hundreds of millions on marketing.

Which Models Are Suited to the Cloud Services Path?

This logic works smoothly for language models. DeepSeek R1 has hundreds of billions of parameters; you can't run it without a GPU cluster. If you want to use it, there are two paths: the free app or the paid API. Either way, the traffic runs on their cloud. The performance gap between large and small models is significant, so users naturally gravitate toward the cloud.

Text-to-image is different. Open-source models like Stable Diffusion and FLUX can run on a single gaming GPU. The barrier is so low that individual users can deploy them at home. If the gap between large and small models isn't that large, the market fragments—a large number of users choose to run locally, and cloud demand doesn't concentrate as much.

Text-to-image and text-to-video have another push factor: because they involve images and video, they naturally face more moderation and regulatory constraints. Cloud services have to do content filtering, but these constraints barely exist for local deployments. This is also pushing some users toward the edge. This is the value of edge AI devices—it's not just a cost issue, but one of power and autonomy.

So whether this open-source model business logic can work depends on whether the capability gap between large and small models is big enough, and whether the barrier to local deployment is high enough. Language models currently satisfy both conditions. Text-to-image is weaker on both. Text-to-video is still changing; it's hard to say.

A Better Cloud

In any case, commercializing models means binding them to cloud services. I think this is actually a good thing.

The old cloud was about selling resources and competing on price; no matter how much you spent on technology, it was hard to differentiate. With models in the mix, it's different—technology investment can directly reduce inference costs and open up pricing space. Companies that do this well can actually make real money.

The endgame for model companies is probably becoming cloud companies, just not the kind that sells bare metal. It's a new type of AI cloud, selling intelligence. And beyond the cloud, the TCO of edge AI is being drastically reduced by AI operations—the two may form a complementary landscape.