Connect with us

Regional News

The Recent Amazon Web Services Outage Shuts Down Thousands of Sites, Leaves Hundreds of Millions Users Offline  

Published

on

When Amazon Web Services went down this week, much of the internet went with it.

The outage began in northern Virginia at AWS’s largest data center, and ultimately impacted over 1,000 organizations and more than 100 million users worldwide. While most services returned to normal after the 15-hour-long outage, the system failure highlights how a single error at a major tech firm can ripple across global internet operations.

“Day-to-day, people don’t even realize that there are so many services that rely on a single entity’s infrastructure,” said Alan Liu, an assistant professor of computer science at the University of Maryland. “When this happens, the impact is so large.”

The outage affected both Amazon-owned organizations and companies that rely on its technological infrastructure.

 

“These companies paid computing costs to Amazon, like we pay utility fees to [Baltimore Gas and Electric] to get gas and electricity,” said Liu. “These companies are paying money to Amazon for their computer storage and network services.”

This isn’t the first outage of its kind. In 2021, Meta experienced a nearly six-hour global disruption that yielded similar consequences. And experts say that it probably won’t be the last.

“We all need to be more resilient and robust in the way we plan,” said David Mussington, a UMD public policy professor of the practice. “Practicing incident recovery and response and protecting key data assets is the way forward.”

What happened? 

The Monday outage occurred at US-East-1, the largest AWS data center in the United States, located in northern Virginia. It impacted thousands of organizations globally, including leading airlines, media outlets, real-time communication platforms, and home security systems.

AWS announced the outage at 12:11 AM PDT on Oct. 20, saying “we are investigating increased error rates and latencies for multiple AWS services in the US-EAST-1 Region.”

Amazon periodically updated its statements throughout the day, stating that the disruption was caused by a Domain Name System (DNS) malfunction, which led to database errors and cascading failures affecting networks worldwide. This system translates user-friendly domain names, such as “CNSMaryland.org,” into the numerical IP addresses that computers use to communicate with each other.

Sites including Delta Airlines, Netflix, Venmo, Snapchat, Duolingo, Canvas, and many others experienced “increased error rates.” Affected entities experienced stalls, delayed communication and posting, and, in some cases, complete shut-downs.

AWS identified the issue and began restoring services by early afternoon on October 20. While most services have now returned to normal, some may still experience residual delays.

What are the consequences? 

This outage resulted in significant financial losses for AWS and impacted numerous companies. Liu said the potential financial impact could surpass $100 billion.

When organizations that rely on digital website traffic experience a shutdown, their revenue stream is interrupted; multiply that loss across the hundreds of companies affected, and the $100 billion figure becomes highly plausible.

“The issue is risk and recovery of loss of business continuity due to an infrastructure failure, and that is something you can mitigate through planning and appropriate contracting,” said Mussington.

According to Mussington, planning should include the establishment of secondary and tertiary providers outside of the tech market leaders as a contingency.

Will this happen again? 

AWS has multiple industry competitors, including Microsoft Azure, Google Cloud Platform, and IBM Cloud. These companies that face identical vulnerabilities could have experienced similar outages, Mussington said.  – risks

“I think the major thing is usually human error,” Liu said. “Engineers are using some code that was not fully tested, or some code that is causing a configuration error.”

“It didn’t happen to them this time, but it could,” Mussington said.

According to its website, AWS has the most extensive, reliable, and secure global cloud infrastructure in the world. Yet, they still experienced a large-scale outage.

This begs the question: How secure are the world’s most secure platforms?

“Amazon is the leading global player in this field,” Liu said. “But the leading player needs to take bigger responsibility for safeguarding the services that depend on it.”

By RUBY SIEFKEN
Capital News Service

Front Royal, VA
50°
Sunny
7:33 am6:19 pm EST
Feels like: 48°F
Wind: 3mph NNE
Humidity: 61%
Pressure: 30.36"Hg
UV index: 2
TueWedThu
57°F / 41°F
52°F / 48°F
61°F / 46°F