Cloudflare, one of the biggest content delivery network services was facing technical issues earlier today. Users were experiencing 502 errors while accessing sites hosted on Cloudflare. Consequently, all websites and services hosted on Cloudflare were affected including the major ones like Discord and Udemy.
United States – (MFE), Memphis, TN, United States – (MEM), Miami, FL, United States – (MIA), Minneapolis, MN, United States – (MSP), Montgomery, AL, United States – (MGM), Montréal, QC, Canada – (YUL), Nashville, TN, United States – (BNA), Newark, NJ, United States – (EWR), Norfolk, VA, United States – (ORF), Omaha, NE, United States – (OMA), Phoenix, AZ, United States – (PHX), Pittsburgh, PA, United States – (PIT), Portland, OR, United States – (PDX), Queretaro, MX, Mexico – (QRO), Richmond, Virginia – (RIC), Sacramento, CA, United States – (SMF), Salt Lake City, UT, United States – (SLC), San Diego, CA, United States – (SAN), San Jose, CA, United States – (SJC), Saskatoon, SK, Canada – (YXE), Seattle, WA, United States – (SEA), St. Louis, MO, United States – (STL), Tampa, FL, United States – (TPA), Toronto, ON, Canada – (YYZ), Vancouver, BC, Canada – (YVR), Tallahassee, FL, United States – (TLH), Winnipeg, MB, Canada – (YWG)).
– Some Affected Regions
This seems to be a worldwide outage as evident from Cloudflare’s affected regions list.
Cloudflare has already implemented a fix and they blame the outage on resource mitigation issues stating, “Major outage impacted all Cloudflare services globally. We saw a massive spike in CPU that caused primary and secondary systems to fall over. We shut down the process that was causing the CPU spike. Service restored to normal within ~30 minutes. We’re now investigating the root cause of what happened.”
Possible DDoS Attack?
Many Twitter users have pointed out that a DDoS attack from China might have been responsible for the outages. Many of these DDoS attack trackers have endpoints to measure the flow and volume of traffic worldwide, so this can’t be taken as conclusive evidence, although they can point at a possibility.
Cloudflare’s CEO Mr.Matthew Prince pointed out technical issues behind the outage stating, “Massive spike in CPU usage caused primary and backup systems to fall over. Impacted all services. No evidence yet attack related. Shut down service responsible for CPU spike and traffic back to normal levels. Digging in to root cause.”
Services have been restored to normal now, but Cloudflare is still investigating the root cause. There was also another Cloudflare outage a while back, but that was caused by Verizon. This was a major one because every Cloudflare service was affected, impacting primary and all fail-over systems.
We will update the post when Cloudflare publish a detailed analysis behind this outrage on their blog.
Update – Cloudflare has ruled out a DDoS attack, they blamed today’s outage on faulty software deployment.
For about 30 minutes today, visitors to Cloudflare sites received 502 errors caused by a massive spike in CPU utilization on our network. This CPU spike was caused by a bad software deploy that was rolled back. Once rolled back the service returned to normal operation and all domains using Cloudflare returned to normal traffic levels.
This was not an attack (as some have speculated) and we are incredibly sorry that this incident occurred. Internal teams are meeting as I write performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again