Microsoft accidentally leaks 38TB of internal data on GitHub

The Redmond giant is having a shocker of a time when it comes to security, with its latest own goal exposing 30,000 internal Teams messages.

This comes after Microsoft revealed last week how Chinese hackers had gotten hold of an internal account signing key, thanks to a series of issues stemming from a “consumer signing system crash”.

The GitHub leak was spotted by researchers at cloud security company Wiz, which regularly scan the internet for “misconfigured storage containers”. In one of their regular scans, they came across a GitHub repository called robust-models-transfer, which belongs to Microsoft’s artificial intelligence (AI) research group.

You’re out of free articles for this month

Username or Email

Password Forgot password?

Keep me signed in on this device.

First Name

Last Name

Mobile

Organisation Type

By becoming a member, I agree to receive information and promotional messages from Cyber Daily. I can opt out of these communications at any time. For more information, please visit our Privacy Statement.

Need help signing up? Visit the Help Centre.

In theory, the repository contained “pretrained robust ImageNet models” for download, but in fact, there was a whole lot more than that.

“... this URL allows access to more than just open-source models,” Wiz said in a blog post. “It was configured to grant permissions on the entire storage account, exposing additional private data by mistake.”

On top of that, anyone accessing the repository also had “full control” permissions rather than just the usual read-only permissions.

“Meaning, not only could an attacker view all the files in the storage account, but they could delete and overwrite existing files as well,” Wiz’s researchers discovered.

And considering what the repository was actually intended for – the sharing of AI models – “an attacker could have injected malicious code into all the AI models in this storage account, and every user who trusts Microsoft’s GitHub repository would’ve been infected by it”.

In theory, the whole account should have still been private, but the use of SAS tokens – account SAS tokens, in particular – meant that users could effectively create their own tokens, client-side. Admins would not even know these tokens exist, and even revoking them is a non-trivial matter.

“Revoking a token is no easy task either – it requires rotating the account key that signed the token, rendering all other tokens signed by the same key ineffective as well,” Wiz said. “These unique pitfalls make this service an easy target for attackers looking for exposed data.”

Wiz first discovered the repository in June 2023 and reported the issue to Microsoft two days later. A SAS token whose expiration date had been set to October 2051 in 2021 was then invalidated by Microsoft on 24 June and then replaced on 7 July.

By 16 August, Microsoft had completed its internal investigation, and both Wiz and Microsoft disclosed the issue.

“The simple step of sharing an AI data set led to a major data leak, containing over 38TB of private data,” Wiz concluded.

“The root cause was the usage of account SAS tokens as the sharing mechanism. Due to a lack of monitoring and governance, SAS tokens pose a security risk, and their usage should be as limited as possible.”

David Hollingworth

David Hollingworth has been writing about technology for over 20 years, and has worked for a range of print and online titles in his career. He is enjoying getting to grips with cyber security, especially when it lets him talk about Lego.

Tags:

Sections

More

Microsoft accidentally leaks 38TB of internal data on GitHub

David Hollingworth