Anthropic launched a pair of new AI models overnight: the “Mythos-class” Claude Fable 5 and the now out of preview Claude Mythos 5.
Fable 5 – which the company called its most powerful generally available model yet – does come with some caveats, however, with Anthropic launching the model with a set of safeguards to protect against misuse, particularly in areas such as cyber security.
Anthropic said that queries on certain topics will be routed to Claude Opus 4.8, its “next most capable model”.
“To release the model both safely and quickly, we’ve tuned these safeguards conservatively – they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5 per cent of sessions,” Anthropic said in a 9 June blog post.
“With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.”
As for Claude Mythos 5, the model lacks those safeguards in some areas and is thus still only available under the auspices of Project Glasswing, at least initially. Anthropic has collaborated with the United States government on the rollout.
Claude Mythos 5, according to Anthropic, “has the strongest cyber security capabilities of any model in the world”.
“The capabilities of models like Fable 5 and Mythos 5 have the potential to do profound good for the world,” Anthropic said.
“We’ve seen the beginnings of this in Project Glasswing, where the models have helped cyber defenders secure critically important software.”
Anthropic is claiming its new models beat competitors such as GPT 5.5 and Gemini 3.1 Pro across every metric, from agentic coding to cyber security. However, not everyone is taking Anthropic at face value.
Charles Guillemet, chief technology officer at digital security firm Ledger, said that Anthropic’s reassurances that the models are safe cannot be trusted.
“If you’re reassured that Anthropic has only shipped a ‘safe’ version of Mythos, don’t be. Large language models’ safeguards have repeatedly shown don’t survive contact with even the laziest adversary. Ask politely a few times; frame it as your son’s science-fair project. The model will cheerfully produce an exploit to break into a hospital network,” Guillemet said.
“In reality, attackers have had functionally equivalent capability for months. The proof is in the tidal wave of exploitation we’ve seen around the world, and the price of stolen access on dark markets has never been lower.
“The only layer that makes infrastructure and humans truly resistant to the rapid proliferation of cyber vulnerabilities exposed and exploited by AI is security by design, including formal verification, using hardware-based secure enclaves. In spite of this, individuals and organisations remain slow to update their software stacks.”
Similarly, Andrew Rubin, chief executive and founder at cyber security company Illumio, said the introduction of guardrails is not proof that the problem has been solved, but rather “the companies building these models don’t fully trust where the capability leads”.
“Constraints at the interface don’t change the underlying math; they simply shape how people can interact with it. Attackers won’t operate at that layer. They’ll go straight after the capability itself,” Rubin said.
“And as these tools become more broadly available, the speed and scale of attacks will only increase. The real question isn’t whether guardrails exist – it’s whether defenders are prepared to operate at the same speed.”
Want to see more stories from trusted news sources?Make Cyber Daily a preferred news source on Google.
David Hollingworth
David Hollingworth has been writing about technology for over 20 years, and has worked for a range of print and online titles in his career. He is enjoying getting to grips with cyber security, especially when it lets him talk about Lego.