Dear Visitor,

Our system has found that you are using an ad-blocking browser add-on.

We just wanted to let you know that our site content is, of course, available to you absolutely free of charge.

Our ads are the only way we have to be able to bring you the latest high-quality content, which is written by professional journalists, with the help of editors, graphic designers, and our site production and I.T. staff, as well as many other talented people who work around the clock for this site.

So, we ask you to add this site to your Ad Blocker’s "white list" or to simply disable your Ad Blocker while visiting this site.

Continue on this site freely
  HOME     MENU     SEARCH     NEWSLETTER    
TECHNOLOGY, DISCOVERY & INNOVATION. UPDATED 8 MINUTES AGO.
You are here: Home / Computing / AWS Was Taken Down by a Typo
Amazon Web Services Was Taken Down by a Simple Typo
Amazon Web Services Was Taken Down by a Simple Typo
By Jef Cozza / Sci-Tech Today Like this on Facebook Tweet this Link thison Linkedin Link this on Google Plus
PUBLISHED:
MARCH
04
2017
A single typo was apparently responsible for taking down a chunk of the Internet on Tuesday, Feb. 28, costing companies somewhere around $150 million. The revelation came from an online statement released by Amazon after its popular Amazon Web Services (AWS) platform was taken offline Tuesday for about four hours.

The service disruption, which affected AWS' Simple Storage Service (S3), resulted in problems for many of the Internet's most popular Web sites and services, including Trello, IFTTT, Slack and Gizmodo. According to Web site monitoring firm Apica, 54 of the largest online retailers experienced performance impairments on their Web sites, with some slowing down more than 20 percent.

Two Subsystems at Fault

"We want to apologize for the impact this event caused for our customers," Amazon said in the statement. "While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses. We will do everything we can to learn from this event and use it to improve our availability even further."

The cause of the disruption was apparently a single typo entered by an Amazon team member who mistyped a command during an attempt to debug the service's billing system.

"At 9:37AM PST [Feb. 28], an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process," Amazon said. "Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems."

The two subsystems affected included an index subsystem that manages the metadata and location information for all S3 objects in the region. The second subsystem was a placement subsystem that manages allocation of new storage.

Full Restart Required

Removing a significant portion of the server capacity caused both of those systems to require full restarts. the company said. While they were being restarted, S3 was unable to service requests.

As a result of the outage, Amazon said it is making several changes to the way its systems are managed. "While removal of capacity is a key operational practice, in this instance, the tool used allowed too much capacity to be removed too quickly," the company said.

Amazon said it has since modified the tool used in the debugging operation to remove capacity more slowly and added safeguards to prevent capacity from being removed when it will take any subsystem below its minimum required capacity level.

The company said that change will prevent an incorrect input from triggering a similar failure in the future. Amazon also promised to make changes to improve the recovery time of key S3 subsystems and to audit its other operational tools to ensure they also have similar safety checks.

Tell Us What You Think
Comment:

Name:

Ty:
Posted: 2017-03-08 @ 10:13am PT
I don't believe a word of it.

Kevin R Molloy:
Posted: 2017-03-08 @ 5:27am PT
Amazon Web services down again 3/8/2017 at 5:15 am

Eleanor:
Posted: 2017-03-07 @ 9:37am PT
Amazon (AWS) seriously did not consider this possible problem when it rolled this out? Where are their systems analysts? And while they have addressed this problem for THIS issue, what of other similar potential problems? PS: Visa status for any of the people involved in this is irrelevant. Such an odd interpretation of the actual issues revealed by this failure.

Brian L. Baker:
Posted: 2017-03-07 @ 9:22am PT
Quit blaming immigrants for stealing your job, and school yourself up to be irreplaceable. By simply whining and not taking care of your own business, you come across as a baby.

Matlas:
Posted: 2017-03-04 @ 11:38pm PT
Keep ignoring software quality assurance by filling positions with H1b's and this is what you pay.

Like Us on FacebookFollow Us on Twitter
MORE IN COMPUTING

SCI-TECH TODAY
NEWSFACTOR NETWORK SITES
NEWSFACTOR SERVICES
© Copyright 2017 NewsFactor Network. All rights reserved. Member of Accuserve Ad Network.