When AI firm Anthropic announced its cutting-edge Claude Mythos model earlier this month, the company revealed it was so powerful and so good at finding vulnerabilities that it was giving access to only a handful of cyber security and technology companies.
Now, just weeks after Project Glasswing was revealed, a handful of unauthorised users have gained access to the model, suggesting the security-focused Mythos isn’t that secure at all.
According to reporting by Bloomberg, a small number of people who are members of a private Discord channel dedicated to researching unreleased AI models have had unofficial access to Mythos since it was first announced.
Getting in was apparently simple, too.
“To access Mythos, the group of users made an educated guess about the model’s online location,” Bloomberg said in an article published on 21 April.
“They based this on knowledge about the format Anthropic has used for other models, the person said, adding that such formatting details were revealed in a recent data breach from Mercor, an AI training start-up that works with a number of top developers.”
Anthropic said it was aware of the access and was investigating the report.
Shane Fry, chief technology officer at RunSafe Security, said it was an example of how easily exploited AI models commonly are.
“Unauthorised users were able to access Anthropic’s Mythos model, reportedly by just changing a model name. Even if their intent is just to explore, it shows how easily these systems can be exposed,” Fry said.
“The reality is these AI capabilities are already out there, ‘hacked’ or not, and they’re going to accelerate how quickly vulnerabilities are found and exploited. Software teams will need to look at how to harden their code so those vulnerabilities can’t be used in the first place.”
Germaine Tan Shu Ting, VP for security and AI strategy and field CISO at Darktrace, expressed similar concerns.
“It shows that the frontline remains identity,” Tan Shu Ting said.
“If Anthropic itself can be accessed using traditional hacking methods (reportedly coopting existing third-party access and ‘internet sleuthing’), then it highlights how critical it is to assume the threat is already inside the walls.”
However, while analysts and industry insiders have reacted to Mythos with something like awe, the actual capabilities of the model may, in reality, fall far short of Anthropic’s claims.
Don’t believe the hype?
Doug Britton, EVP and chief strategy officer of RunSafe Security, referred to Mythos and Project Glasswing earlier in April as a “watershed moment for AI’s runaway zero-day discovery and exploitation”.
“AI is now uncovering memory safety bugs at massive scale, including vulnerabilities that have been hiding in production code for over 25 years – the problem isn’t just that these bugs exist, it’s that they’re being found faster than organisations can fix them,” Britton said.
But the question is – are they being found that fast? Davi Ottenheimer, security engineer and president of security consultancy flyingpenguin, has some serious doubts.
“The supposedly huge Anthropic ‘step change’ appears to be little more than a rounding error. The threat narrative so far appears to be ALL marketing and no real results,” Ottenheimer said in a blog post around the time Mythos and Glasswing were announced.
“The Glasswing consortium is regulatory capture dressed up poorly as restraint.”
Ottenheimer based his observations – rather caustic ones, it must be said – on Anthropic’s own Claude Mythos Preview System Card, a “whoppingly inefficient 244-page document that devotes just seven pages to the claim that the model is too dangerous to release”.
According to Ottenheimer, only seven of those pages do not mention the acronyms one might expect: CVSS, CWE or CVE.
“The flagship demonstration document turns out to be like the ending of the Wizard of Oz, a sorry disappointment about a model weaponising two bugs that a different model found, in software the vendor had already patched, in a test environment with the browser sandbox and defence-in-depth mitigations stripped out. Anthropic failed, and somehow the story was flipped into a warning about its success,” he said.
Ottenheimer has many issues with Anthropic’s – and, it must be said, the wider media’s – claims that Mythos found “thousands of zero-day vulnerabilities in every major operating system and every major web browser”, and he pulls no punches.
Referencing that claim, Ottenheimer points out that the word “thousands” is “used once, in reference to transcripts reviewed during the alignment evaluation”.
“It is never used to describe vulnerabilities. The cyber security section (Section 3, pages 47–53) contains no count of zero-days at all,” Ottenheimer said.
“With no CVE list, no CVSS distribution, no severity bucket, no disclosure timeline, no vendor-confirmed-novel table, no false-positive rate, why are you teasing us with the claims about vulnerabilities at all?”
Cyber Daily has reached out to Anthropic for comment.
Want to see more stories from trusted news sources?Make Cyber Daily a preferred news source on Google.
David Hollingworth
David Hollingworth has been writing about technology for over 20 years, and has worked for a range of print and online titles in his career. He is enjoying getting to grips with cyber security, especially when it lets him talk about Lego.