Skip to main content
All Posts

Tags:

Infra

5 posts

A Token Is Not a Thing

Demand for GPT-5.5 and Opus 4.7 is nearly infinite, the mid-tier has vanished, and low-to-mid-range compute sits idle. The token economy sounds like selling electricity, but it's more like a gas station: 98-octane is sold out, diesel tanks are full yet self-service only, and 95-octane sits empty.

7 min read

The Most Expensive Waste in the Agent Era: GPUs Waiting on CPUs

I ran seven hundred rounds of AI Infra experiments, and thirty-five hours were entirely eaten up by environment startup. At first I thought GPT-5.5 fast mode wasn't fast enough, but later realized it wasn't the model thinking—it was the model waiting for the CPU. Intel has already tightened the server CPU:GPU ratio from 1:8 to 1:1.

6 min read

DeepSeek V4 Day: It's About Infra, Not the Model

V4 capabilities sit around the Opus 4.6 tier, but pushing FP4 to production, making million-token context the default, and day-0 adaptation for domestic chips is a disaster for everyone in the inference infra business. Add GPT-5.5, Vision Banana, and LPM 1.0 into the mix, and this week has crammed in more new releases than the entire past quarter.

7 min read