Skip to main content
All Posts

Tags:

Infra

2 posts

The Most Expensive Waste in the Agent Era: GPUs Waiting on CPUs

I ran seven hundred rounds of AI Infra experiments, and thirty-five hours were entirely eaten up by environment startup. At first I thought GPT-5.5 fast mode wasn't fast enough, but later realized it wasn't the model thinking—it was the model waiting for the CPU. Intel has already tightened the server CPU:GPU ratio from 1:8 to 1:1.

6 min read

DeepSeek V4 Day: It's About Infra, Not the Model

V4 capabilities sit around the Opus 4.6 tier, but pushing FP4 to production, making million-token context the default, and day-0 adaptation for domestic chips is a disaster for everyone in the inference infra business. Add GPT-5.5, Vision Banana, and LPM 1.0 into the mix, and this week has crammed in more new releases than the entire past quarter.

7 min read