Problems in Amazon’s key cloud hub have caused cascading outages across many platforms that rely on the internet for operations.
Amazon Web Services, one of the world’s most critical cloud infrastructure providers, has begun to recover after an outage earlier on Oct. 20 caused major disruptions across dozens of popular platforms, apps, and games.
The outage, beginning early Monday morning, rippled through major consumer and enterprise platforms. About three hours after the outage began, Amazon reported the service was starting to recover. It wasn’t until around 6 p.m. ET that the company reported “services returned to normal operations.”
Reports on Downdetector throughout the day showed widespread access failures on Amazon, Coinbase, Ring, Snapchat, Reddit, Slack, United Airlines, Zoom, and multiple online gaming networks, including those for Fortnite, Roblox, Pokémon Go, and Epic Games services.
On Monday evening, Amazon said it had addressed the underlying issue of the outage and was close to a resolution, but reported that some users were still experiencing lingering difficulties using services such as Venmo and Zoom.
AWS, Amazon’s $100 billion cloud division, underpins large swaths of global infrastructure, hosting everything from streaming platforms and smartphone apps to financial services and emergency systems.
AWS reported “increased error rates and latencies” beginning shortly after 3:11 a.m. ET, affecting multiple services in its US-East-1 region—a hub that powers much of the global internet.
“These problems are impacting multiple services that depend on AWS infrastructure,” the company said in a statement. “We’re monitoring the situation.”
By 5:01 a.m. ET, AWS said it had identified a “potential root cause,” tracing the disruption to a problem affecting how one of its core database systems connects and communicates. Specifically, Amazon said the issue stemmed from a breakdown in how its DynamoDB service was being reached and accessed across the network, adding that the company’s engineers were pursuing “multiple parallel paths to accelerate recovery.”
AWS said at 5:27 a.m. ET that it was seeing “significant signs of recovery” and that most network access requests “should now be succeeding,” with the company pledging further updates as restoration continues.
At 6:35 a.m. ET, AWS issued another update confirming that the “underlying DNS issue has been fully mitigated,” though it cautioned that some services were still working through backlogs.
The company said requests to bring systems online in the US-East-1 region were continuing to face elevated error rates and advised customers still experiencing issues to flush DNS caches.
“We continue to work toward full resolution,” AWS said, noting lingering throttling across services such as CloudTrail and Lambda.
By Tom Ozimek and Joseph Lord