What Can We Learn from the Recent Amazon Outage?
By Chris BrunauRecently, Amazon’s Simple Storage Service (S3) suffered an outage that impacted a large number of major companies, such as Slack, Nike, and many others. According to Business Insider, the affected companies lost an estimated $150 to $160 million as a result.
The outage was due to an error that occurred while debugging the billing system. According to Amazon, “an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems.”
According to GeekWire, many AWS services are dependent on S3. Therefore, when S3 is down, other AWS services may not work as expected. The US-East-1 is one of the largest regions for AWS, therefore the outage had a larger impact than if it were in a different region.
Read IT Quik has some tips on how to avoid the AWS Mistakes and prevent falling victim to a similar disaster. Their tips include having defined security responsibilities, paying attention to logs, password protection, and more.
Backupify recently migrated from AWS to Datto’s private cloud. While AWS allowed Backupify to grow rapidly and scale compute and storage resources affordably, the Datto Cloud is more cost efficient and allows Backupify to closely control and tailor the security to our specific needs.