cyber daily logo

Breaking news and updates daily. Subscribe to our Newsletter

Breaking news and updates daily. Subscribe to our Newsletter X facebook linkedin Instagram Instagram

Reddit might be signing a content deal to help train AI

The mammoth social media company’s user-generated content could be used to train AI models, according to insiders.

user icon David Hollingworth
Mon, 19 Feb 2024
Reddit might be signing a content deal to help train AI
expand image

The company that calls itself the “front page of the internet” has reportedly signed a deal with an unnamed AI company to use its content as training material.

Bloomberg is reporting that “people familiar with the matter” are aware of the deal, and that Reddit has already informed investors, ahead of launching an initial public offering.

“The San Francisco-based firm told prospective investors in its IPO that it had signed the deal, worth about $60 million on an annualised basis, earlier this year, the people said,” Bloomberg reports.


“Reddit’s agreement with an unnamed large AI company could be a model for future contracts of a similar nature, one of the people said.”

According to Bloomberg’s sources, while Reddit’s revenue last year was US$800 million – a 20 per cent year-on-year increase – the AI deal will “help Reddit tap into investors’ enthusiasm for the technology and boost its IPO”.

However, the deal appears to be subject to change and Reddit is declining to make any comment on the apparent partnership.

What’s in it for the AI?

With a huge pool of commentary and discussion on nearly every topic under the sun, this seems like a great deal for whatever AI company might be involved.

But it’s often not exactly unbiased information. Reddit’s relative anonymity fosters a culture of unfiltered expression, resulting in a plethora of biased, offensive, or outright false information. An AI trained on such data could inevitably internalise and perpetuate these biases – potentially leading to harmful outcomes.

Plus Reddit’s voting system can amplify popular opinions over factual ones. Thus, the AI might prioritise regurgitating popular but incorrect notions instead of providing accurate information.

Reddit’s also home to countless niche communities, each with its own jargon, memes, and cultural nuances. Training on such diverse data might result in an AI that struggles to comprehend or communicate effectively outside of Reddit’s ecosystem.

Any AI trained on Reddit content could well be a master of casual, online conversation and an array of trending topics, or it could just lead to the worst of the site’s bad habits getting rolled into one very fractured personality.

It’s certainly a big deal, we’re just not sure if it’s a good one.

David Hollingworth

David Hollingworth

David Hollingworth has been writing about technology for over 20 years, and has worked for a range of print and online titles in his career. He is enjoying getting to grips with cyber security, especially when it lets him talk about Lego.

cyber daily subscribe
Be the first to hear the latest developments in the cyber industry.