Earlier this week a good chunk of the internet started playing up for millions of users.
Tuesday’s disruption, caused by an issue at website-hosting operation Amazon Web Services (AWS), lasted for about five hours, which was about five hours too long for most of those affected.
Amazon fessed up on Thursday and admitted it was human error that had caused the problem. Specifically, a typo — yes, a careless typo — led to the cloud-based chaos, the result of some erroneous input procedure by an Amazon employee.
In an explanatory message posted online, the company said its Simple Storage Service (S3, part of AWS) team in northern Virginia had been sorting out an issue that was causing the S3 billing system to progress slower than it should.
During the work, a team member “executed a command which was intended to remove a small number of servers” used by the S3 billing process.
But “one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.”
When that happened, lots of web users around the world presumably started frowning at their displays, possibly uttering an expletive or two while at the same wondering, “Is it my end or their end?”
While Amazon Web Services hosts plenty of big-name operations such as Netflix, Spotify, Instagram, Airbnb, and Expedia, these sites apparently escaped any disruption during the incident. Instead, those that temporarily hit the buffers included collaboration tool Trello, sites built with website-building tool Wix, and others such as Lonely Planet, Medium, IFTTT, and Quora.
Amazon’s cloud computing service has grown massively since it launched in 2006 and is now a fast-growing part of its business as more and more companies turn to it for hosting responsibilities.
That’s why mishaps with the service can quickly cause serious trouble for millions of web users. On Thursday, Amazon apologized for the incident and promised to “do everything we can to learn from this event.”