// you’re reading...

Filed Under: Cloud Computing | Technology

Cloud Computing – Amazon EC2 Zone Reaches Capacity

I’m planning to write a series of posts documenting in detail the experiences that we have had at OleOle migrating our entire website infrastructure from a traditional managed hosting company to Amazon’s Cloud Computing services (EC2, S3, etc.). This was a process we began scoping out at the beginning of ‘09 and actually completed just a couple months ago.

This post is jumping right into the middle of things so to speak where we are already well and fully entrenched in Amazon’s cloud now, and having to deal with the many issues that crop up from time to time that they don’t tell you about in the marketing material.

So on to the subject of today’s post:

We noticed yesterday that all our new EC2 app server instances were booting up in Amazon’s Zone US-east-1d, whereas up till that point in time we had always used Zone US-east-1b with no issues. Several core parts of our system including db servers, load balancers and memcache servers are in Zone US-east 1b.

What’s going on? Well apparently zone 1b is at or near capacity and when we try to force a new instance to be in that zone, we get a message saying “insufficient capacity” , which means any new app server instances that start are not likely to have any chance to be in the same zone as our dbs, load balancers and memcache servers.

Amazon Zone 1b Over Capacity

Amazon Zone 1b Over Capacity

Furthermore, because our EC2 app server instances all autoscale via Rightscale, they are always short lived – constantly being terminated when capacity is not needed and new ones coming online automatically when load spikes up at certain times of day. The autoscaling is fantastic, a true thing of beauty to behold when you see it action and really makes “utility computing” a reality for us. However, it also means that all of our app server instances are now in zone 1d and not our preferred zone 1b.

And why does this matter? LATENCY is why! We performed some basic testing of ping times between zones:

1b to 1b – average ping time between a server in each zone: 0.45ms
1a or 1d to 1b – average ping time between a server in each zone: 1.9ms

That is a whopping 4 times increase in latency when going across zones versus having all your servers in the same zone. When you have a multitude of calls going on between servers (app to db, app to memcache, and back again, load balancer to app, etc.) in the context of a single web request from a user, that 4 times increase in latency becomes very noticeable, even though we are talking milliseconds differences with each single call.

What can we do about it? We could migrate everything over to zone 1d, but then that seems like a stop gap solution. There’s no guarantee that 1d won’t run out of capacity, in fact, it’s almost certain to at some point, and when that happens, it will force us to have to migrate yet again to keep all our servers in the same zone.

Maybe this is just one of the pitfalls of this type of cloud computing platform, but I can’t get over the feeling that accepting a 4 times increase in latency is just not acceptable.

NOTE: I am well aware that Amazon touts having multiple zones to put your servers in as a plus to avoid a single point of failure as each zone is in a different data centre (and possibly geography). And we have all seen recently what can happen when something takes out a Rackspace data centre (trips on a powercord, network outage, car crashes into a generator, whatever…). But the cost of this is a lot of added latency and it should be a choice that we get to make whether we want to accept that latency or not.

Discussion

4 comments for “Cloud Computing – Amazon EC2 Zone Reaches Capacity”

  1. Spreading out across different availability zones also increases operating cost: you pay for inter-zone networking traffic. EC2 traffic within the same availability zone is not charged.

    Posted by Shlomo | July 16, 2009, 5:10 pm
  2. Good point Shlomo! yet another reason to not like having to spread out over zones “forced” upon us.

    Posted by David | July 16, 2009, 5:13 pm
  3. Wow, a real eye opener. I haven’t heard anything but glowing reports about Amazon’s hosting up until now. Your posts are gold!

    Won’t they add more capacity to each zone (including zone US-east 1b) soon though?

    Virtual servers do make so many cool things (like autoscaling) possible, but we went through hell at Hcareers migrating to the virtual servers at our parent company. One of the biggest problems I kept running into was the poor clock synchronization. You’d be amazed at how many things that can f-up.

    Posted by Todd Zuccolo | July 16, 2009, 5:18 pm
  4. [...] Amazon’s EC2 recently reaching capacity at certain EC2 zones, I now shake my head in dismay at what to me, is another poor showing by a [...]

    Posted by Wire Turf | Amazon, oh Amazon, You Continue to Disappoint Me | July 30, 2009, 3:15 pm

Post a comment