<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Wire Turf &#187; Cloud Computing</title>
	<atom:link href="http://www.wireturf.com/category/cloud-computing/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.wireturf.com</link>
	<description></description>
	<lastBuildDate>Sat, 13 Mar 2010 06:06:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>In the aftermath of Intermedia&#8217;s extended outage, an important lesson to be learned for SAS providers</title>
		<link>http://www.wireturf.com/2010/03/12/in-the-aftermath-of-intermedias-extended-outage-an-important-lesson-to-be-learned/</link>
		<comments>http://www.wireturf.com/2010/03/12/in-the-aftermath-of-intermedias-extended-outage-an-important-lesson-to-be-learned/#comments</comments>
		<pubDate>Sat, 13 Mar 2010 05:53:56 +0000</pubDate>
		<dc:creator>David</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.wireturf.com/?p=141</guid>
		<description><![CDATA[As a current (and reasonably long time) customer of SAS Exchange hosting provider Intermedia.com, we at OleOle were naturally affected to some extent by Intermedia&#8217;s extended system outage on March 5th, 2010. For pretty much the entire morning on that day, we, along with thousands of their other customers, had zero email capability, no sending, [...]]]></description>
			<content:encoded><![CDATA[<p>As a current (and reasonably long time) customer of SAS Exchange hosting provider <a href="http://www.Intermedia.com">Intermedia.com</a>, we at OleOle were naturally affected to some extent by Intermedia&#8217;s extended system outage on March 5th, 2010. For pretty much the entire morning on that day, we, along with thousands of their other customers, had zero email capability, no sending, no receiving, zilch.</p>
<p>To make matters worse, during a large part of this outage, Intermedia&#8217;s own website was unavailable, so affected customers could not even go onto the Intermedia website to check for status updates or open support tickets. Needless to say, their PBX was being bombarded by thousands of irate customers as well, so getting someone on the line for an update wasn&#8217;t that easy either. Twitter ended up being the best source of updates, first from other customers who tweeted what info they could glean, and then later from Intermedia&#8217;s own Twitter account when they managed to get more caught up and started giving out some official updates.</p>
<p>Today I received their formal RFO (Reasons for Outage) letter via email which goes into great details describing why this outage occurred and what steps they are taking to try to prevent a re-occurrence for the same reasons in future. In a nutshell, there was a hardware failure in one of their EMC SAN devices, and this failure occurred in such a way that prevented the device&#8217;s own in-built fault tolerance mechanisms from allowing the SAN to effectively remain &#8220;up&#8221; &#8211; that is, they are saying this is one of those failures that should not have happened. These devices are designed precisely NOT to fail under such circumstances, but nonetheless it did fail.</p>
<p>Intermedia&#8217;s letter goes on to describe the actions they are taking along with the hardware vendor to guard against this in future. All very good and well. Now on to the little gem in the letter that I found the most surprising, and from which all technologists with &#8220;uptime&#8221; responsibility for Software as a Service (SAS) systems would do well to learn from.</p>
<p>Here&#8217;s the bit that really caught my attention:</p>
<blockquote><p>&#8220;During the event, our ability to communicate status effectively was hindered by an outage of our corporate communication tools until 9:50 a.m. PST.  The databases for www.Intermedia.net, Intermedia’s client control panel and Intermedia’s trouble ticket system were located on the affected SAN and therefore were not available during the SAN event. These systems were restored as soon as the SAN performance issue was resolved. All available personnel were directed to answer incoming customer calls. Intermedia logged over 2,000 incoming calls to our PBX and effectively answered more than 1,000 of those calls.&#8221;</p></blockquote>
<p>In hindsight it seems pretty obvious doesn&#8217;t it? Why locate your &#8220;corporate communications tools&#8221; and &#8220;trouble ticket system&#8221; on the same infrastructure as the core service that you provide? In this case, it might have been the thinking that the EMC SAN just couldn&#8217;t possibly fail as it was inherently designed to be fault tolerant, and indeed, EMC SANs are extremely heavy duty devices with very good track records for what they do. Yet fail it did and with it, came down key parts of the foundation, all in one go. Or maybe it was to save on costs. Or maybe it was just a careless oversight. We don&#8217;t really know why, but we do know it was implemented that way and that it was clearly a flawed design decision.</p>
<p>Naturally, Intermedia themselves now intend to fix this:</p>
<blockquote><p>&#8220;As a high priority for completion, no later than Q2, Intermedia will also be isolating corporate communication infrastructure from the same infrastructure that provides our Exchange services,  guaranteeing that we will be able to communicate effectively with clients at all times during a service interruption. &#8220;</p></blockquote>
<p>Again, this revision might seem like the obvious system and network design that should have been implemented from the get-go, especially for a SAS provider as long in the business and as large as Intermedia. But yet it was not done as such, and it took an outage on this scale to force a change that now seems so obvious in hindsight. Certainly a lesson we can all learn from.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wireturf.com/2010/03/12/in-the-aftermath-of-intermedias-extended-outage-an-important-lesson-to-be-learned/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Surprise! Not all Amazon EC2 compute units are created equal</title>
		<link>http://www.wireturf.com/2010/01/10/surprise-not-all-amazon-ec2-compute-units-are-created-equal/</link>
		<comments>http://www.wireturf.com/2010/01/10/surprise-not-all-amazon-ec2-compute-units-are-created-equal/#comments</comments>
		<pubDate>Sun, 10 Jan 2010 19:11:35 +0000</pubDate>
		<dc:creator>David</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[MySql]]></category>

		<guid isPermaLink="false">http://www.wireturf.com/?p=116</guid>
		<description><![CDATA[A very interesting discovery made by our sys admin not so long ago: While Amazon EC2 sells its hosting services on the notion of leasing virtualized servers with a guaranteed amount of standard compute units, memory and disk space, it turns out that in fact, not all EC2 compute units are created equal. In other [...]]]></description>
			<content:encoded><![CDATA[<p>A very interesting discovery made by our sys admin not so long ago: While <a href="http://aws.amazon.com/ec2/" rel="nofollow" target="_blank">Amazon EC2</a> sells its hosting services on the notion of leasing virtualized servers with a guaranteed amount of standard compute units, memory and disk space, it turns out that in fact, not all EC2 compute units are created equal. In other words, imagine you boot up 2 separate virtual servers (or instances as they are known in EC2 speak) and these are both <a href="http://aws.amazon.com/ec2/#instance" rel="nofollow" target="_blank">EC2 Extra Large instances</a>. Each instance comes with 8 EC2 compute units &#8211; which is essentially supposed to be the amount of raw CPU processing power available to you where the larger the number of compute units, the more processing power your (server) instance should be giving you.</p>
<p>Now one would expect that since you are paying the same amount of money to Amazon for each server instance created of this same type and size, that you should be getting the same performance out of each one. Sadly, that is a very wrong assumption, as our sys admin found out.</p>
<p>It turns out that the underlying hardware for each instance created impacts the actual performance that each instance gives you, even though the instances are all virtualized and marketed by Amazon as if they are all created equal. In our case, we found that the different underlying hardware that the virtual instance sits on has a significant impact on application performance, at least with respect to MySQL database performance. Instances that were created on machines with AMD&#8217;s Opteron 270 processors (2ghz 1mb L2 cache) showed significantly poorer MySQL performance compared to instances created on machines with Intel&#8217;s Xeon e5430 processors (2.66ghz 6mb L2 cache). Well, the hardware techies among you out there might be saying &#8220;well duh&#8230; of course the Xeon will spank with the Opteron, tell me something I don&#8217;t already know.&#8221; But that&#8217;s not the point.</p>
<p>The point is in both cases, the EC2 customer is paying the same for an instance that is marketed as having identical compute units (i.e. processing power), but the reality is very different. Bear in mind that one cannot select what underlying hardware you want your instances to be powered up on &#8211; what we did was simply keep destroying and creating new instances until we found that the new instance was created on the Xeon-based hardware that we wanted (TIP: from the Linux shell of the new instance, run this to see what hardware your instance was created on: cat /proc/cpuinfo).</p>
<p>Moral: while cloud computing with Amazon (or likely any other vendor of this ilk) has definite pluses, there are hidden gotchas that they don&#8217;t tell you about.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wireturf.com/2010/01/10/surprise-not-all-amazon-ec2-compute-units-are-created-equal/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Amazon, oh Amazon, You Continue to Disappoint Me</title>
		<link>http://www.wireturf.com/2009/07/30/amazon-oh-amazon-you-continue-to-disappoint-me/</link>
		<comments>http://www.wireturf.com/2009/07/30/amazon-oh-amazon-you-continue-to-disappoint-me/#comments</comments>
		<pubDate>Thu, 30 Jul 2009 22:15:17 +0000</pubDate>
		<dc:creator>David</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Amazon EC2]]></category>

		<guid isPermaLink="false">http://www.wireturf.com/?p=93</guid>
		<description><![CDATA[Following Amazon&#8217;s EC2 recently reaching capacity at certain EC2 zones, I now shake my head in dismay at what to me, is another poor showing by a service that I would love to love, if only they would let me! So today we get an email from them soliciting feedback to their SimpleDB service, which [...]]]></description>
			<content:encoded><![CDATA[<p>Following Amazon&#8217;s EC2 recently <a href="http://www.wireturf.com/2009/07/16/cloud-computing-part-1-amazon-ec2-zone-reaches-capacity/">reaching capacity</a> at certain EC2 zones, I now shake my head in dismay at what to me, is another poor showing by a service that I would love to love, if only they would let me!</p>
<p>So today we get an email from them soliciting feedback to their SimpleDB service, which we recently tried out for a few days, but found somewhat lacking for our needs. The email goes like this:</p>
<blockquote><p>
Dear OleOle,<br />
Amazon Web Services is constantly striving to improve our customers&#8217; experience using our products. We particularly want feedback from our new users.<br />
On 6/23/2009, you signed up for Amazon SimpleDB. Please share your experience about getting started with Amazon Web Services by completing the following survey (9 questions): Amazon SimpleDB Getting Started Survey [This last bit being a link to their survey]. etc etc.
</p></blockquote>
<p>So &#8220;Great!&#8221;, thinks me, they are being proactive, and hopefully they will improve this service and make it useful for us. Alas, not so fast. The link to their &#8220;survey&#8221; goes to &#8220;Cannot Find Server&#8221; &#8211; basically that domain for the survey doesn&#8217;t resolve in DNS.</p>
<div id="attachment_95" class="wp-caption aligncenter" style="width: 660px"><img src="http://www.wireturf.com/wp-content/uploads/2009/07/amazon-survey-not-found.gif" alt="Maybe they should have created the sub-domain first?" title="amazon-survey-not-found" width="650" height="293" class="size-full wp-image-95" /><p class="wp-caption-text">Maybe they should have created the sub-domain first?</p></div>
<p>That&#8217;s just so sloppy. Incompetent even. Is Amazon getting too big for it&#8217;s britches? </p>
]]></content:encoded>
			<wfw:commentRss>http://www.wireturf.com/2009/07/30/amazon-oh-amazon-you-continue-to-disappoint-me/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Twitter Doc Theft &#8211; Details Revealed: Step By Step To How It Was Done</title>
		<link>http://www.wireturf.com/2009/07/19/twitter-doc-theft-details-revealed-step-by-step-to-how-it-was-done/</link>
		<comments>http://www.wireturf.com/2009/07/19/twitter-doc-theft-details-revealed-step-by-step-to-how-it-was-done/#comments</comments>
		<pubDate>Mon, 20 Jul 2009 04:41:14 +0000</pubDate>
		<dc:creator>David</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.wireturf.com/?p=76</guid>
		<description><![CDATA[TechCrunch posted a great step by step account this morning that details almost exactly how Frenchman Hacker Croll (HC) was able to steal over 300 sensitive Twitter corporate docs, as well as gain access to numerous online accounts of several Twitter employees. It&#8217;s a long article, but very interesting and if you have any interest [...]]]></description>
			<content:encoded><![CDATA[<p>TechCrunch posted <a href="http://www.techcrunch.com/2009/07/19/the-anatomy-of-the-twitter-attack/" target="_blank">a great step by step account</a> this morning that details almost exactly how Frenchman Hacker Croll (HC) was able to <a href="http://www.techcrunch.com/2009/07/14/in-our-inbox-hundreds-of-confidential-twitter-documents/" target="_blank">steal over 300 sensitive Twitter corporate docs</a>, as well as gain access to numerous online accounts of several <a href="http://twitter.com" rel="nofollow" target="_blank">Twitter</a> employees.</p>
<p>It&#8217;s a long article, but very interesting and if you have any interest in keeping a tight reign over the security of data that you keep online (email etc), you owe it yourself to give the TC post a thorough read. Now that we have details as to exactly what occurred and how it was done, my head is spinning with the myriad number of security issues raised by this incident. I plan to write a series of posts in the coming days discussing these issues in greater detail.</p>
<p>I quote here the TechCrunch summary of the attack:</p>
<blockquote>
<ol>
<li>HC accessed Gmail for a Twitter employee by using the password recovery feature that sends a reset link to a secondary email. In this case the secondary email was an expired Hotmail account, he simply registered it, clicked the link and reset the password. Gmail was then owned.</li>
<li>
HC then read emails to guess what the original Gmail password was successfully and reset the password so the Twitter employee would not notice the account had changed.</li>
<li>
HC then used the same password to access the employee’s Twitter email on Google Apps for your domain, getting access to a gold mine of sensitive company information from emails and, particularly, email attachments.</li>
<li>
HC then used this information along with additional password guesses and resets to take control of other Twitter employee personal and work emails.</li>
<li>
HC then used the same username/password combinations and password reset features to access AT&#038;T, MobileMe, Amazon and iTunes, among other services. A security hole in iTunes gave HC access to full credit card information in clear text. HC now also had control of Twitter’s domain names at GoDaddy.</li>
<li>
Even at this point, Twitter had absolutely no idea they had been compromised.</li>
</ol>
<p><em>Source: <a href="http://www.techcrunch.com/2009/07/19/the-anatomy-of-the-twitter-attack/" target="_blank">TechCrunch</a></em>
</p></blockquote>
<p>WOW! So many things jump right out at me from reading this, including:</p>
<ol>
<li>Sloppy email and password account management by Twitter employees concerned</li>
<li>Dangers of mixing work and personal email activities</li>
<li>What kind of online footprint you leave by your public participation in social networks, and how vulnerable to attack that can make you (actually this is hinted at not from the above summary account but other details in the TC post)</li>
</ol>
<p>More to come on this subject in the next few days.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wireturf.com/2009/07/19/twitter-doc-theft-details-revealed-step-by-step-to-how-it-was-done/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cloud Computing &#8211; Amazon EC2 Zone Reaches Capacity</title>
		<link>http://www.wireturf.com/2009/07/16/cloud-computing-part-1-amazon-ec2-zone-reaches-capacity/</link>
		<comments>http://www.wireturf.com/2009/07/16/cloud-computing-part-1-amazon-ec2-zone-reaches-capacity/#comments</comments>
		<pubDate>Thu, 16 Jul 2009 23:16:14 +0000</pubDate>
		<dc:creator>David</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Amazon EC2]]></category>

		<guid isPermaLink="false">http://www.wireturf.com/?p=28</guid>
		<description><![CDATA[I&#8217;m planning to write a series of posts documenting in detail the experiences that we have had at OleOle migrating our entire website infrastructure from a traditional managed hosting company to Amazon&#8217;s Cloud Computing services (EC2, S3, etc.). This was a process we began scoping out at the beginning of &#8217;09 and actually completed just [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m planning to write a series of posts documenting in detail the experiences that we have had at OleOle migrating our entire website infrastructure from a traditional managed hosting company to Amazon&#8217;s Cloud Computing services (EC2, S3, etc.). This was a process we began scoping out at the beginning of &#8217;09 and actually completed just a couple months ago.</p>
<p>This post is jumping right into the middle of things so to speak where we are already well and fully entrenched in Amazon&#8217;s cloud now, and having to deal with the many issues that crop up from time to time that they don&#8217;t tell you about in the marketing material.</p>
<p>So on to the subject of today&#8217;s post:</p>
<p>We noticed yesterday that all our new EC2 app server instances were booting up in Amazon&#8217;s Zone US-east-1d, whereas up till that point in time we had always used Zone US-east-1b with no issues. Several core parts of our system including db servers, load balancers and memcache servers are in Zone US-east 1b.</p>
<p>What&#8217;s going on? Well apparently zone 1b is at or near capacity and when we try to force a new instance to be in that zone, we get a message saying “insufficient capacity” , which means any new app server instances that start are not likely to have any chance to be in the same zone as our dbs, load balancers and memcache servers.</p>
<div id="attachment_32" class="wp-caption alignleft" style="width: 460px"><img class="size-full wp-image-32" title="amazon-over-capacity" src="http://www.wireturf.com/wp-content/uploads/2009/07/amazon-over-capacity.jpg" alt="Amazon Zone 1b Over Capacity" width="450" height="299" /><p class="wp-caption-text">Amazon Zone 1b Over Capacity</p></div>
<p>Furthermore, because our EC2 app server instances all autoscale via Rightscale, they are always short lived – constantly being terminated when capacity is not needed and new ones coming online automatically when load spikes up at certain times of day. The autoscaling is fantastic, a true thing of beauty to behold when you see it action and really makes &#8220;utility computing&#8221; a reality for us. However, it also means that all of our app server instances are now in zone 1d and not our preferred zone 1b.</p>
<p>And why does this matter? LATENCY is why! We performed some basic testing of ping times between zones:</p>
<p>1b to 1b &#8211; average ping time between a server in each zone: 0.45ms<br />
1a or 1d to 1b &#8211; average ping time between a server in each zone: 1.9ms</p>
<p>That is a whopping 4 times increase in latency when going across zones versus having all your servers in the same zone. When you have a multitude of calls going on between servers (app to db, app to memcache, and back again, load balancer to app, etc.) in the context of a single web request from a user, that 4 times increase in latency becomes very noticeable, even though we are talking milliseconds differences with each single call.</p>
<p>What can we do about it? We could migrate everything over to zone 1d, but then that seems like a stop gap solution. There&#8217;s no guarantee that 1d won’t run out of capacity, in fact, it&#8217;s almost certain to at some point, and when that happens, it will force us to have to migrate yet again to keep all our servers in the same zone.</p>
<p>Maybe this is just one of the pitfalls of this type of cloud computing platform, but I can&#8217;t get over the feeling that accepting a 4 times increase in latency is just not acceptable.</p>
<p>NOTE: I am well aware that Amazon touts having multiple zones to put your servers in as a plus to avoid a single point of failure as each zone is in a different data centre (and possibly geography). And we have all seen recently what can happen when something takes out a Rackspace data centre (<a title="rackspace power outage" href="http://www.techcrunch.com/2009/07/07/someone-needs-to-stop-tripping-over-the-power-cord-at-rackspace/" target="_blank">trips on a powercord</a>, <a href="http://www.techcrunch.com/2009/06/29/yes-rackspace-is-down-and-so-are-many-of-your-favorite-sites/" target="_blank">network outage</a>, <a href="http://www.techcrunchit.com/2008/07/10/rackspace-downtime-a-reminder-that-all-are-vulnerable/" target="_blank">car crashes into a generator</a>, whatever&#8230;). But the cost of this is a lot of added latency and it should be a choice that we get to make whether we want to accept that latency or not.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.wireturf.com/2009/07/16/cloud-computing-part-1-amazon-ec2-zone-reaches-capacity/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
