LLM - Tags | Jiawei Guan

The Strongest Model Just Dropped, and I’m Not 'Qualified' to Use It

GPT 5.6 looks strong, but only ~20 U.S.-approved partners can use it; I realize intelligence you can buy is already fair, while permission is the real gap.

June 29, 20267 min read

AI LLM Geopolitics Open Source Reflections

Top-Tier Intelligence, Cut Off Overnight

I found Fable 5 swapped to Opus 4.8; U.S. Commerce banned it in three days. Closed-source AI can vanish overnight; Chinese open-source models step in.

June 15, 20265 min read

AI Anthropic LLM Reflections

Intelligence Is Starting to Be About Wealth

Fable 5 ran a 15-hour task for $420. After June 22 it leaves Coding Plan for API rates, a 10x cost jump: access to top intelligence is a wealth problem.

June 11, 20263 min read

AI LLM Evals Reinforcement Learning Reflections

Models Keep Getting Stronger, but 'Strongest' Has No Single Answer

Top models tie on GPQA at 92–94%, yet real-world results diverge. Three skills now split: hard problems, rough tasks, open exploration; the last is hardest to measure.

June 3, 20267 min read

AI LLM OpenAI Anthropic Reflections

What Goes Around Comes Around: A New Model Every Month and a Half

GPT-5.5 arrived six weeks after 5.4; Opus 4.7 followed 4.6 in two months. GPT now sounds human, Opus doesn't, and judgment's half-life keeps shrinking.

April 24, 20267 min read

AI DeepSeek Infrastructure LLM Reflections

DeepSeek V4 Day: It's About Infra, Not the Model

DeepSeek V4 matches Opus 4.6, but FP4, 1M-token context, and day-0 chip support stress inference infra. GPT-5.5, Vision Banana, and LPM 1.0 landed too.

April 24, 20267 min read

AI LLM Pricing Business Models Reflections

On LLM Pricing: Supply Is Locked by Chips, the Rest Is Business Philosophy

LLM pricing is stuck: chip controls cap supply, while three user groups pull demand into different shapes. The once-obvious Coding Plan is now under fire.

April 22, 20267 min read

AI Claude LLM OpenClaw Performance Reflections

The Days Around the Opus 4.7 Launch

Opus 4.7 kept me up. I tested it, merged OpenClaude PRs, and pushed Strix Halo Qwen3-30B prefill to DGX Spark levels. Agents make parallel work real.

April 17, 20269 min read

AI Open Source LLM Reflections

The Open Source Community's DeepSeek Moment

Researching text-to-video, I found Chinese dominance in open-source LLMs hasn't reached every domain. From LLaMA to Qwen to DeepSeek, what changed?

April 7, 20265 min read

AI LLM Tools Workflows

Six Models, Six Personalities

I switch models daily. Opus is reckless but strong, GPT 5.4 drifts, Gemini misses bugs, domestic models each have quirks. Switching beats prompt tuning.

March 13, 20265 min read