Claude Opus 4.8 has been out about a month. Most capable model Anthropic has shipped, built for long-running autonomous work. (Fable/Mythos was available for a brief moment.) Everyone posts the benchmark numbers, but nobody seemed to post on what it does in production.
I run Claude Sonnet as the backbone of a personal AI system on a Raspberry Pi 4. CRM, signal monitoring, email drafting, incident management, daily briefings. A second brain. Real workloads, not demos.
And that’s the point: the backbone is Sonnet, not Opus. The workhorse isn’t the frontier model.
Every new model is genuinely better, but the bigger unlock was never the model — it’s figuring out which tasks actually belong to a frontier model versus a lightweight one. Not to mention, which tasks are ready to be handed over.
Benchmarks tell you the ceiling. Production tells you where the ceiling actually matters.
So the real question: what workload in your stack would you actually trust to a frontier model today?