Op-Ed: No, the Gmail ‘data breach’ is not a breach

There are nuances to cyber security reporting that are easy to miss.

When confronted with a truly enormous amount of exposed or compromised personal credentials, it’s easy to assume that a breach has occurred and act accordingly. Headlines get written, the story starts to mutate away from the truth, and before you know it, everyone’s talking about that new giant Gmail data breach in their office chat channels.

You’re out of free articles for this month

Username or Email

Password Forgot password?

Keep me signed in on this device.

First Name

Last Name

Mobile

Organisation Type

By becoming a member, I agree to receive information and promotional messages from Cyber Daily. I can opt out of these communications at any time. For more information, please visit our Privacy Statement.

Need help signing up? Visit the Help Centre.

That is exactly what happened to me this morning, when a colleague asked if I’d heard about exactly that.

I had, I told him, and after just a little bit of research, I was able to steer them in the right direction. No, this is not a massive Gmail breach, as one headline put it, nor is it a leak, as another article framed the story.

For those not familiar with the issue, Troy Hunt, of HaveIBeenPwned fame, wrote a blog post last week about a particularly large dataset he has recently come into possession of; and 3.5 terabytes of data, consisting of 23 billion rows of records, is nothing to be sneezed at. But it’s also not something to build a dramatic headline around without proper research, which is exactly what Hunt lays out in his blog post.

“It’s a vast corpus, and if we were attempting to compete with recent hyperbolic headlines about breach sizes, this would be one of the largest. But I’m not going to play the ‘mine is bigger than yours’ game because it makes no sense once you start analysing the data,” Hunt said in a 22 October blog post.

“Part of what makes the data so large is that we’re actually looking at both stealer logs and credential stuffing lists, so let’s assess them separately, starting with those stealer logs.”

What follows is Hunt’s typically astute research and reporting. Of that “vast corpus”, only 8 per cent represent previously unseen sets of credentials. That is, combinations of websites, email addresses, and passwords that have not appeared in a leak, breach, stealer log, or combo-list.

Fresh data, in other words, in this case gathered in part by malware that doesn’t target Google or Gmail, but rather is stealthily installed on an individual’s device to steal their credentials, which is why you end up with a very particular data format.

“Stealer logs are the product of info stealers, that is, malware running on infected machines and capturing credentials entered into websites on input. The output of those stealer logs is primarily three things: website address, email address, and password,” Hunt said.
“Someone logging into Gmail, for example, ends up with their email address and password captured against gmail.com, hence the three parts.”

Hunt’s analysis of the entire dataset found 183 million unique email addresses, with 8 per cent of them – more than 14 million, which is still a lot! – never seen before.

Of course, Gmail addresses are going to be in there – it’s free, very popular, and tied into much of how we as users navigate the modern internet. That’s why criminals like to collect this kind of data, so they can take over accounts or gain access to services via someone else’s login.

So, yes, there’s now a whole bunch of new account details in circulation – but that’s not something that’s happened to Gmail, like a malicious intrusion in order to steal data from Google. Rather, it’s data stolen directly from users, tied to the login credentials they enter into their browsers and devices every day.

Something to be alarmed about (so change your passwords!), but not a breach, and not a leak. I’ll leave the final word to Hunt, who puts it quite eloquently.

“Something that is becoming more evident as we load more stealer logs is that treating them as a discrete ‘breach’ is not an accurate representation of how these things work,” Hunt said in conclusion.

“The truth is that, unlike a single data breach such as Ashley Madison, Dropbox, or the many other hundreds already in HIBP, stealer logs are more of a firehose of data that’s just constantly spewing personal info all over the place.”

The real difference, at the end of the day, is that while a breach – such as Optus or Medibank – may represent a failure on the part of a company to secure its data, these stealer logs represent something else again, and that’s tied to personal cyber security, not corporate.

David Hollingworth

David Hollingworth has been writing about technology for over 20 years, and has worked for a range of print and online titles in his career. He is enjoying getting to grips with cyber security, especially when it lets him talk about Lego.

Tags:

Sections

More

Op-Ed: No, the Gmail ‘data breach’ is not a breach

David Hollingworth