Facility went down after nor’easter knocked out utility power in the region
*Updated 03/08 with comments from CoreSite
A power outage at an Equinix data center is at least partially to blame for connectivity problems some of Amazon’s cloud customers experienced last Friday. The Amazon Web Services hiccup has been blamed for temporarily silencing the company’s smart assistant Alexa and disrupting operations for AWS customers including Atlassian, Twilio, and Capital One.
In a report on the AWS service health dashboard, Amazon representatives said its Direct Connect service had lost connectivity to Equinix data centers DC1 through DC6 and DC10 through DC12 in Ashburn, Virginia (all these facilities comprise a single Equinix campus), and CoreSite’s VA1 and VA2 data centers in nearby Reston.
“When the weather impacting the entire East Coast affected power at a property we lease, we activated our contingency plans,” an Equinix spokesperson said in an emailed statement sent in response to an inquiry from Data Center Knowledge about the AWS outage. “We regret that these actions didn’t prevent a service interruption for some customers. The data center affected is currently operating on normal utility provided power. We’ve been in touch with our customers who were impacted, and we will investigate this matter, so we can prevent something like it from happening in the future.”
In an emailed statement, Greer Aviv, CoreSite's VP of investor relations and corporate communications, said the CoreSite data centers mentioned did not go down but did not explain why Direct Connect links to the two facilities had dropped.
"CoreSite did not experience a power outage at either VA1 or VA2 in terms of our uptime," Aviv said. "The event was isolated to AWS services in the US East Region. Customers must plan for unscheduled service disruptions, and CoreSite helps them by providing access to AWS services across several geographic Regions. We are proactively investigating a root cause to ensure events like this don’t impact customers going forward."
The region experienced widespread power outages starting Friday as a result of the massive nor’easter cyclone that went through it over the weekend. In Loudoun County, home to one of the highest concentrations of data centers on the planet, 15,000 customers were without power Friday, utility Dominion Energy reported. Another nor’easter was expected to slam the East Coast Tuesday night.
Direct Connect allows companies to link to AWS servers via a private network instead of using the public internet. Both cloud providers and data center providers like Equinix and CoreSite pitch such services as a more secure and reliable way for enterprises to use cloud infrastructure. A company like Atlassian, for example, may lease some space at an Equinix data center to house its servers in Ashburn and link those servers via a private network to an AWS data center in the same region. (The example is purely hypothetical; we don't know how exactly Atlassian uses AWS.)
Direct cloud links have been a core business focus for Equinix in recent years. The company has positioned its colocation facilities as hubs where enterprises can get this kind of network access to all the major cloud providers.
It’s unclear whether the power outage affected the entire Equinix campus in Ashburn or a single data center that happened to house critical infrastructure for all Direct Connect links on campus. We’ve asked Equinix to clarify and will update this story once we hear back.
Direct Connect issues at the AWS US-EAST-1 region started around 6:20 am Eastern on Friday March 2 and went on for close to four hours. During that time, “some customers” in the Equinix and CoreSite data centers had lost Direct Connect links to the Amazon data centers in the region, according to the AWS status report.
AWS does not offer a Service Level Agreement (SLA) for Direct Connect, according to an FAQ on the company’s website. It does recommend that customers set up redundant Direct Connect links to prevent outages and enable Bidirectional Forwarding Detection, which ensures that drops in connectivity are detected quickly and the redundant links are used.