Citation :
You’ve got the model lineup slightly wrong.
It’s Haiku Sonnet Opus. Mythos isn’t a successor to Opus. It’s more of a specialized sidegrade.
I’ve worked with both Mythos and Opus 5 internally, and they serve very different purposes. Mythos is highly specialized: it’s optimized for identifying and prototyping exploitation paths in software systems (especially open-source). It’s extremely strong in that narrow domain, but that’s not what most people are usualy looking for in a general model.
Opus 5, on the other hand, is the flagship general-purpose model. It incorporates and builds on some of the tuning work that went into Mythos, but it’s not constrained by that specialization. In practice, it’s a substantial step up from Opus 4.7 (arguably the largest jump in capability since the GPT-3.5 era)
Opus 4.7 is currently the strongest model that has gone through full safety hardening. The gap between 4.6 and 4.7 is particuarly noticeable in agentic software engineering. In my own testing (primarily around iterative game prototyping) 4.7 consistently outperformed 4.6 in both reliability and problem-solving depth.
I maintain a fairly large internal benchmark set of tasks that 4.6 (and earlier models) struggled with. 4.7 succesfully handled nearly all of them on the first pass, including every prior regression I tracked. The few failures I observed were attributable to my setup rather than the model.
More importantly, 4.7 was able to complete tasks that 4.6 couldn’t meaningfully approach at all. That’s the bigger thing for me.
|