Tag: multi-model

  • Benchmarks Tell You the Ceiling

    Claude Opus 4.8 has been out about a month. Most capable model Anthropic has shipped, built for long-running autonomous work. (Fable/Mythos was available for a brief moment.) Everyone posts the benchmark numbers, but nobody seemed to post on what it does in production. I run Claude Sonnet as the backbone of a personal AI system…