HomeBusiness‘Signs of recovery’ claims Amazon Web Services after internet outage hits many...

‘Signs of recovery’ claims Amazon Web Services after internet outage hits many websites and apps – business live | Business


Full story: Amazon Web Services outage hits dozens of websites and apps

Dan Milmo

A major internet outage has hit dozens of websites and apps around the world, with users reporting troubles getting online after problems at Amazon’s cloud computing service.

The affected platforms include Snapchat, Roblox, Signal and Duolingo as well as a host of Amazon-owned operations including its main retail site and the Ring doorbell company.

In the UK, Lloyds bank was affected as well as its subsidiaries Halifax and Bank of Scotland, while there were also reports of problems accessing the HM Revenue and Customs website on Monday morning. Also in the UK, multiple Ring users took to social media to complaint their doorbells were not working.

In the UK alone reports of problems on individual apps ran into the tens of thousands for each platform. More here:

Share

Key events

Show key events only

Please turn on JavaScript to use this feature

AWS outage shows perils of relying on US tech giants

By bringing down popular web sites, apps and services across the world, the problem with Amazon’s DynamoDB database service has highlighted just how dependent global businesses and users are on the company’s web services.

Cori Crider, executive director of the Future of Technology Institute, has warned that the UK is “dangerously overexposed to foreign Big Tech monopolies”, saying:

“The UK can’t keep leaving its critical infrastructure at the mercy of US tech giants. With Amazon Web Services down, we’ve seen the lights go out across the modern economy – from banking to communications.

This isn’t just an inconvenience; it’s a strategic vulnerability. Britain is dangerously overexposed to foreign Big Tech monopolies that don’t answer to UK regulators or the public. If we want digital resilience, the answer isn’t just better oversight – it’s digital sovereignty. We need to build and back British cloud infrastructure that secures our economy and safeguards our future.”

Britain’s competition watchdog recently conducted an inquiry into cloud computing, which concluded that it could designate both Microsoft and AWS as companies with “strategic market status” in cloud services, which would give the watchdog the power to tackle conduct that could undermine fair competition, or exploit people and businesses.

Share

Some services are restored, others still report problems

Some of the services which were forced offline by the problems at Amazon Web Services are returning to action.

The UK’s tax office, HMRC, is now able to process login requests on its site again.

Canva, the online design and visual communication platform, reports that “the majority of functionality” has been recovered, but also warns that users may still see issues with downloading designs.

[awkwardly, Amazon has promoted Canva as an innovator which uses AWS to “to deliver seamless, personalized design experiences to over 235 million monthly users worldwide”.]

However, internet doorbell service Ring is still reporting a ‘partial outage’ on its website and apps.

Share

Encouragingly, AWS are now rating the severity of today’s outage as “impacted”.

Earlier, when apps and websites across the Internet were stricken, it was rated as “degraded” (a more severe situation).

Share

Expert: Why DynamoDB problems caused global outage

We flagged earlier that the disruption at Amazon Web Services involved DynamoDB, one of its core infrastructure services.

Mike Chapple, IT professor at the University of Notre Dame’s Mendoza College of Business, explains why DynamoDB is important, and why its failure has caused so much disruption today:

DynamoDB isn’t a term that most consumers know, but it underpins the apps and services that all of us use every single day. It’s a centralized database service that many Internet-based services use to track user information, store key data, and manage their operations. DynamoDB is one of the record-keepers of the modern Internet. It’s fast, it’s cheap, and it’s reliable.

But today it stopped working and we saw the effects of that outage ripple across the Internet. We’ll learn more in the hours and days ahead but early reports indicate that this wasn’t actually a problem with the database itself. The data appears to be safe. Instead, something went wrong with the records that tell other systems where to find their data. Amazon had the data safely stored, but nobody else could find it for several hours, leaving apps temporarily separated from their data.

It’s as if large portions of the Internet suffered temporary amnesia. This episode serves as a reminder of how dependent the world is on a handful of major cloud service providers: Amazon, Microsoft, and Google. When a major cloud provider sneezes, the Internet catches a cold.”

Share

“It’s always DNS”

Marek Szustak, IT Security Officer at online travel agency eSky Group, isn’t surprised to hear that today’s problems relate to the Domain Name System (effectively the internet’s phonebook).

Szustak explains:

Today’s outage in the AWS US-EAST-1 region shows how even the largest cloud environments can be paralysed by a seemingly minor piece of infrastructure. In this case, the problem concerned DNS, the foundation of network communication. When domain name resolution stops working, entire applications and services can stop responding, no matter how well they are designed.

This is a good lesson for companies using the cloud: it is worth designing systems so that a failure in one region or provider does not bring the entire business to a halt. Redundancy, geographical distribution of resources and testing of emergency scenarios should be the norm, not a luxury.

And besides, as engineers say, it’s always DNS…

Share

Although services seem to be coming back online, it appears the problem at AWS isn’t fully fixed yet.

In its latest update, the cloud computing operator says:

We are continuing to work towards full recovery for EC2 launch errors, which may manifest as an Insufficient Capacity Error. Additionally, we continue to work toward mitigation for elevated polling delays for Lambda, specifically for Lambda Event Source Mappings for SQS.

We will provide an update by 5:00 AM PDT [that’s 1pm in the UK].

Share

According to TechRadar, the popular word game Wordle was hit by today’s outage.

Wordle’s working OK now, though* – an indication that the worst of today’s outages may be over, given AWS’s progress in fixing the problem

(* yes I got it, but it took five guesses, so only just…)

Share

A Lloyds Bank spokesperson has asked customers to ‘bear’ with it, while it works to bring services back online, saying:

“Issues with Amazon Web Services are affecting some of our services right now.

“We’re sorry about this and ask customers to bear with us while we work to bring all our services back online as soon as possible.”

Share

AWS: The underlying DNS issue has been fully mitigated

Another update from Amazon Web Services, who report that the underlying issue causing today’s outage has now been “fully mitigated.

In an update timestamped at 3:35 AM PDT (or 11.35am UK time), AWS says:

The underlying DNS issue has been fully mitigated, and most AWS Service operations are succeeding normally now. Some requests may be throttled while we work toward full resolution. Additionally, some services are continuing to work through a backlog of events such as Cloudtrail and Lambda.

The DNS, or Domain Name System (DNS) is used to map addresses on the internet, by translating human-readable domain name (such as www.the guardian.com) into numerical IP addresses that can be read by routers to direct traffic across the web.

Share

Today’s outage does not appear to be caused by a cyber-attack, reports Dr Amro Al-Said Ahmad, a lecturer in computer science at Keele University, who explains:

The issue appears to be related to AWS (Amazon Web Services), which hosts the infrastructure that underpins much the internet services. It allows customers to deploy their own servers, databases, and storage without the need to own physical infrastructure.

According to AWS’s latest update, they have identified the root cause of the outage. It appears to be significant error rates for requests made to their data storage service, DynamoDB, in the US-EAST region. Therefore, the outage was not caused by cyber-related attacks, as was speculated.

Resolving major outages like this presents significant challenges because of the cloud complexity and its dependencies. Furthermore, diagnoses need to see how much third-party platforms are dependent on AWS cloud. The solution and fix will involve thorough diagnostics, testing, and deployment of a reliable fix, which, based on past incidents in the industry, can take anywhere from hours to several days.

Share

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read

spot_img