Internal Documents Show How Amazon Scrambled to Fix Prime Day Glitches

DooKey

[H]F Junkie
Joined
Apr 25, 2001
Messages
13,554
Amazon's Prime Day got off to a rough start earlier this week, but they still managed to make a boat load of money. Their rough start began with the front page of their website showing a bunch of dog pictures with an error code. They got things straight after a few hours and then went on to make hundreds of millions of dollars. The behind the scene story according to internal documents is they just didn't have enough servers online to handle the load. Kind of surprising for a company that sells server time and space to others. As the glitches grew they killed international traffic and added servers manually since their auto-scaling feature appeared to fail. Just goes to show even the mighty fail once in a while. However, Amazon tends to learn from their mistakes so next year will probably be pretty smooth.

Caesar says the root cause of the problem may have to do with a failure in Amazon’s auto-scaling feature, which automatically detects traffic fluctuations and adjusts server capacity accordingly. The fact that Amazon cut off international traffic first, rather than increase the number of servers immediately, and added server power manually instead of automatically, is an indication of a breakdown in auto-scaling, a critical component when dealing with unexpected traffic spikes, he said.
 
I wonder if some folks got fired over that. I just had a recruiter send me an unsolicited email about a position on the team that owns the infrastructure behind amazon.com. It was casually mentioned that customers can tangibly see the impact of the work I'd be doing.

Probably not, but funny to think about.
 
It might also be that they identified a need for more human resources?

I wonder if some folks got fired over that. I just had a recruiter send me an unsolicited email about a position on the team that owns the infrastructure behind amazon.com. It was casually mentioned that customers can tangibly see the impact of the work I'd be doing.

Probably not, but funny to think about.
 
It might also be that they identified a need for more human resources?

As I said, probably not. I just found it funny that they're looking for new software engineers to work on infrastructure scaling projects a few days after a very public failure of their infrastructure scaling solutions.
 
I wonder if some folks got fired over that. I just had a recruiter send me an unsolicited email about a position on the team that owns the infrastructure behind amazon.com. It was casually mentioned that customers can tangibly see the impact of the work I'd be doing.

Probably not, but funny to think about.

At the very least they got screamed at. Amazon is legendary for having a very toxic culture.
 
I wonder if some folks got fired over that. I just had a recruiter send me an unsolicited email about a position on the team that owns the infrastructure behind amazon.com. It was casually mentioned that customers can tangibly see the impact of the work I'd be doing.

Probably not, but funny to think about.

Ugggh it's amazon. I can assure you people were fired.
 
At the very least they got screamed at. Amazon is legendary for having a very toxic culture.

That's why I politely declined. Though I've heard from people that it's getting better as long as you're not at a location other than Seattle.
 
I thought it funny, I got an email from another vendor that I've signed up to receive "specials" from, and the title started this way - "Tired of looking at dog pictures? Check out our deals!"
 
Is funny how people's perception of an issue is related to how much they love or hate the company :) .

I dislike Amazon, but I don't we should judge based on just issues, but response to issues. Good on the team fixing it.
 
i have several internal documents that i could leak which show how i scrambled to not by a damn thing this time on fakesaleday

uh... i mean prime day.
 
  • Like
Reactions: DocNo
like this
I wonder if some folks got fired over that. I just had a recruiter send me an unsolicited email about a position on the team that owns the infrastructure behind amazon.com. It was casually mentioned that customers can tangibly see the impact of the work I'd be doing.

Probably not, but funny to think about.

I'm thinking the other way around, I'm probably guessing the traffic was huge this time around and since there hasn't been anything like this before, them managing to compensate in resources in a few hours rather than shutdown everything is a success for the team. On a separate topic It is toxic there so a little scream here and there was probably the first reaction. lol
 
I'm thinking the other way around, I'm probably guessing the traffic was huge this time around and since there hasn't been anything like this before,l

You mean, like, last Prime Day? Or the one before?
 
I personally didn't have any troubles. My only issue is my HD10 didn't ship. I assume they ran out of them at $100.
 
I work for Amazon IT in a fulfillment center. Those even thinking of working at Amazon, don't. It's a good place to start and get your chops but put a goal of 5 years and out. That way you cash in on the benefits, which are amazing, and take the money and run. People like to hire ex-Amazonians because they know you've been through the fire. As for Prime day. first of all it's goal is to empty the warehouses of stuff that hasn't sold. I cruise the deals and I see pictures of the stuff that seems to stay on our shelves. In the weeks leading up to prime I could see the push items as they came in by the pallet load and usually we just have a smattering of them in the building. Instapot is one of them. We sell more of those than anything else at Amazon. As an aside, I have one and it's better than sliced bread! As part of IT I can view the trouble tickets. I've read the one for these problems and it's a whopper! It was a war room ticket until the fan blades started turning brown then it was get out of the way. The fact that we recovered as fast as we did is testament to the great IT people we have. I'm not going into specifics cause I don't relish getting fired. When we find the person that leaked the internal memo and a copy of the trouble ticket to CNBC, and we will, I'd hate to even be on this planet. That person needs to be looking for an island somewhere. Most everything we do is databases. Huge, globally linked database that use distributed computing to sync up around the world. When capacity fell on the order side, the warehouse side had to follow. We could not pick or pack in ours for close to 6 hours. When you scan something to pick it, you're poking a database. When you put it in a box and send it down the conveyor you're poking a data base and on and on. 99% of our computing is done on VM's. Hence the capacity deal. Yes it seems the automated system flopped but watching people add capacity was amazing. Different systems would request an addition percentage of horsepower and in between 5 and 20 minutes they had new VM's spun up and online. That side worked. We will do a deep dive and we find true root cause and we will fix it. We use very few off the shelf products. Most, and goal is for it to be all, of our software is written in house. We learn from our mistakes and will learn from this one. Some heads are gonna roll but that's the corporate way.
 
True
 

Attachments

  • AB762F4B-35AB-4CDE-986C-5A7101267D48.jpeg
    AB762F4B-35AB-4CDE-986C-5A7101267D48.jpeg
    164.1 KB · Views: 0
Perhaps they should re-evaluate using AWS and consider Azure. :rolleyes:

Well they’ve certainly got capacity...

Tbh I think it’s all just manufactured, just have it fall over. Look how busy we are. Get a bigger stock bump than you would if you made some extra sales.

I’ve had that conversation with a marketing director in a previous company when people wanted to go mental with capacity for a new product launch, the alternative view is being overrun is great for building hype. Same coin as artificial shortages.
 
Back
Top