You have 0 free articles left this month.
Register for a free account to access unlimited free content.
Powered by MOMENTUM MEDIA
lawyers weekly logo

Powered by MOMENTUMMEDIA

For breaking news and daily updates, subscribe to our newsletter.
Advertisement

Cloudflare outage due to outdated tech, expert says

It wasn’t a DDoS or other malicious act that took down a major internet infrastructure service provider – it was ’80s-era technology. But we shouldn’t overreact, says Gartner.

Cloudflare outage due to outdated tech, expert says
expand image

Cloudflare has proudly protected the web from some of the largest distributed denial-of-service (DDoS) attacks the world has ever seen, but it wasn’t a DDoS that was responsible for an outage of Cloudflare’s own infrastructure overnight.

It was, in the words of Gartner analysts, a simple operational failure.

“Operational errors caused the Cloudflare outage, rather than DDoS or infrastructure failure, as evidenced by reported widespread HTTP 500 errors,” a team of Gartner analysts said of the incident.

 
 

“This is consistent with the pattern of hyperscale cloud provider outages that Gartner has observed over the past decade.”

The incident began at about 1am Sydney time, or shortly before midday UTC, as tracked by Cloudflare’s own status page.

“Cloudflare is experiencing an internal service degradation. Some services may be intermittently impacted,” Cloudflare initially said.

“We are focused on restoring service. We will update as we are able to remediate. More updates to follow shortly.”

The outage lasted approximately seven and a half hours and impacted some of the internet’s most popular sites, including Elon Musk’s X platform, ChatGPT, and popular online game League of Legends. It caused disruptions to traffic systems in the US state of New Jersey and several city services in New York, with similar disruptions reported elsewhere in the world.

Cloudflare later pointed the finger at a configuration issue.

“The root cause of the outage was a configuration file that is automatically generated to manage threat traffic. The file grew beyond an expected size of entries and triggered a crash in the software system that handles traffic for a number of Cloudflare’s services,” Cloudflare said while it was still resolving the issue.

“There is no evidence that this was the result of an attack or caused by malicious activity. We expect that some Cloudflare services will be briefly degraded as traffic naturally spikes post-incident, but we expect all services to return to normal in the next few hours.

“Given the importance of Cloudflare’s services, any outage is unacceptable. We apologise to our customers and the internet in general for letting you down today. We will learn from today’s incident and improve.”

By 19:28 UTC – three hours ago as of time of publication – Cloudflare had marked the incident as ‘resolved’ on its status page.

Jake Moore, ESET’s global cyber security adviser, said outages such as this, and similar ones in recent months, highlight the fragile nature of the technology the internet relies upon.

“The outages we have witnessed this last few months have once again highlighted the reliance on these fragile networks,” Moore told Cyber Daily.

“Companies are often forced to heavily rely on the likes of Cloudflare, Microsoft, and Amazon for hosting their websites and services, as there aren’t many other options. The problems causing these outages have occurred due to DNS (Domain Name System) problems, which are most likely overwhelmed. The technology is based on an outdated, legacy network that redirects words in a web address into computer-friendly numbers.”

Moore said that when this falls over, it does so in a catastrophic manner, causing widespread outages; however, replacing these outdated systems is no easy matter.

“It may sound risky, but the major cloud providers actually have lots of impressive failsafes in place and usually provide more protection than the lesser well-known cloud providers,” Moore said.

As to how to respond to this outage and prepare for others, Gartner’s analysts warned against overreaction.

“Cloud provider outages are inevitable, and customers must not overreact. If this outage affected you or a partner, don’t let it affect your trust in cloud service providers. Such outages are not unique to Cloudflare, since major cloud providers have experienced similar incidents, highlighting the risk that hyperscale’s inherent complexity creates for customers and the need for a proactive approach to resilience,” Gartner said.

“Avoid the impulse to overreact to such incidents by partitioning applications or providers. These approaches are expensive, complex and may not be valuable in the typical time frame of an outage. If applied, they should prioritise the systems most impactful to the business and be used sparingly.”

Bob Wambach, vice president of portfolio and strategy at Dynatrace, said the incident was a reminder of how dependent our world has become on 24/7 access to services and websites.

“Today’s IT environments are far more complex and interconnected than many realise, so when an outage occurs, the ripple effects can quickly spread across industries and into people’s daily lives,” Wambach said.

“As our reliance on technology grows and AI continues to reshape how we operate, maintaining that visibility across complex digital ecosystems will be essential. The organisations best prepared for the future will be those that can see across their entire environment, anticipate risks, and adapt quickly when the unexpected happens.”

David Hollingworth

David Hollingworth

David Hollingworth has been writing about technology for over 20 years, and has worked for a range of print and online titles in his career. He is enjoying getting to grips with cyber security, especially when it lets him talk about Lego.

Tags:
You need to be a member to post comments. Become a member for free today!

newsletter
cyber daily subscribe
Be the first to hear the latest developments in the cyber industry.