AWS Beijing is down How bad can it get.

I was called at 3 am today. Great what did we break now!? Well, nothing seems to be working said the engineer who escalated the problem to me. We found that our monitoring system for our many customers was not working. The reason AWS Beijing is down. Well they will say something different, as of now 12:06 PM they have this explanation up. Well they have more explanations, but this is the core issue of it.

This is BAD stuff. If your web interface does not work, it does not matter if this is JUST one Availability zone. I mean for instance the server instances could not be displayed among other issues. Aside we have multi Availability Zone RDS machines that are also not working (RDS is their managed DB service), if this affected one availability zone they should have had a failover to the working Availability Zone. Why have they not failed over? As a result, the company I work for has has had the customers breaking the phone since then, I would guess the same happened for other customers.

The company where I work is a CMT partner of AWS; to get CMT partnership AWS makes you go throw a very hard process to get certified. The point of having CMT certification to prove you are able to maintain High availability services and best practices, security, etc. Now, they can not do it themselves?! This is just a horrible business practice.

BTW: This post is my personal opinion, nothing to do with my job or the company I work for. This is a clarification for the buzzfeeds of the world :P.

There is a lot to say about this issue, but I will keep it short for the sake of not making this too long. To conclude an images about the problem. Interface with error to show you:

AWS Beijing










