cyber daily logo

Breaking news and updates daily. Subscribe to our Newsletter

Breaking news and updates daily. Subscribe to our Newsletter X facebook linkedin Instagram Instagram

2.6m Duolingo datasets leaked on hacking forum, practically for free

Data from 2.6 million Duolingo users has been dumped onto a clear web hacking forum, for the apparent low, low price of a few dollars for anyone interested.

user icon David Hollingworth
Tue, 29 Aug 2023
2.6m Duolingo datasets leaked on hacking forum, practically for free
expand image

And it seems more than a few of the forum’s users are very keen on the data.

The data, however, is not new. It appears to be the same 2.6 million sets of data that was on offer in January 2023, which was initially scraped from an exposed Duolingo API and posted for sale for US$1,500.

However, it appears that the price was too high for most. Currently, the entire dataset can be unlocked for just eight site credits on the Breached hacking forum. Credits can be bought for as little as 500 for €120, or roughly 24 cents each. In Australian dollars, the cost to access the 2.6 million lines of data is just $3.25.


“I saw the sample data and found that the fields were very comprehensive,” said one forum user of the data. “So great to share.”

Other users were just as pleased with the data, though many of them also seem to be very new users on the forum.

The data is largely publicly facing, and in the vast majority of cases, already available in other breaches, according to security researcher Troy Hunt of Have I Been Pwned.

“Each address is in *loads* of existing breaches, but looking at the broader set of data, there’s no single one that appears 100 per cent of the time,” Hunt said on Twitter.

“Collection #1 is very prevalent because it’s massive and *very* public, but sometimes it’s only Adobe.

“So ultimately, it’s a bunch of email addresses out there in the big melting pot of data breach land being used to compromise even more of our personal info.”

Data scraping exists in a legal limbo, with legitimate companies offering such services. Crawlbase, for example, has a specific page dedicated to scraping data from Duolingo, complete with a claim that the company uses “high-quality rotating proxies to avoid blocked requests, IP bans, and CAPTCHAs with ease”.

And you know a technically legal practice is frowned upon when even Meta complains about it.

“This industry covertly collects information that people share with their community, family and friends, without oversight or accountability, and in a way that may implicate people’s civil rights,” Meta stated in a recent blog post earlier in the year.

David Hollingworth

David Hollingworth

David Hollingworth has been writing about technology for over 20 years, and has worked for a range of print and online titles in his career. He is enjoying getting to grips with cyber security, especially when it lets him talk about Lego.

cd intro podcast

Introducing Cyber Daily, the new name for Cyber Security Connect

Click here to learn all about it
cyber daily subscribe
Be the first to hear the latest developments in the cyber industry.