I recently moved some computing jobs over to Amazon’s Elastic Cloud Computing (EC2) in attempt to avoid some odd computing issues/errors I kept encountering on our lab computers (Apple Xserve 3,1).
The big trade off here is that the lab computers are paid for and using EC2 means we’ll be sinking more money into computing resources. With that expense should come faster processing (i.e. less time) to perform various analyses. As they say, time is money…
Let’s look at how things’ve worked out so far.
First, how much did we spend and how did we spend it (click on the image to enlarge)?
(http://eagle.fish.washington.edu/Arabidopsis/20160801_AmazonEC2_costs.jpg)
Of course, it’s easy to see that for the instance I was running, it cost us $0.419/hr. That’s great and all, but you sort of lose sense of what that ends up costing over the long-term. Let’s look at how things break out over a larger time scale.
According to Amazon’s (very useful!) billing breakdown, we spent $187 in the month of July 2016. This doesn’t seem too bad. In fact, this would only cost us ~$2200/yr if we continue to run this instance in this fashion. However, let’s look at it a bit further.
We see that the instance ran for a total of 374 hrs during July 2016. Divide that by 24hrs/day and we see that the instance was running for 15.6 days; just over half the month. That means we would’ve spent ~$374 for the full month, which would equate to $4488/yr. For our lab, that kind of money starts to add up and one starts to wonder if it wouldn’t be better to invest in higher end hardware to use in the lab with a single “sunk” cost that will last us many, many years.
Regardless, with the lab’s current computing hardware, we should compare another factor that’s involved with the expense of using Amazon EC2 instead of our lab computers: time.
I performed a very rough “guestimation” of the time savings that EC2 has provided us.
I compared the length of “real” time for the first step in the PyRad program using the same data set on one of our lab computers (roadrunner) and the Amazon EC2 instance:
roadrunner: 1118 minutes
EC2: 771 minutes
Roadrunner is nearly 1.5x slower than the EC2 instance! To really appreciate what type of impact that has, we should look at the run time for the full PyRad analysis:
roadrunner: 5546 minutes (NOTE: Due to incomplete analysis, roadrunner time is “guestimated” as 1.45 x EC2; see below)
EC2: 3825 minutes
Let’s convert those numbers into something more easily understood - hours and days:
roadrunner: 95hrs
roadrunner: ~4 days
EC2: 63hrs
EC2: ~2.6 days
Of course, these times don’t take into account any technical issues that we might encounter (and I have encountered many technical issues using roadrunner) on either platform, but I can tell you that I’ve not had any headaches using EC2 (other than unintentional, self-imposed ones).
Another potential option is trying out InsideDNA. They offer cloud computing services that are specifically geared towards high-throughput bioinformatics analysis. They have many, many bioinformatics tools already installed and available to use on their platform. Additionally, they have nice tutorials on how to use some of these tools, which goes a long ways in getting started on any analyses using new software. Here are the various pricing tiers that they offer:
(http://eagle.fish.washington.edu/Arabidopsis/20160801_insideDNA_pricing.jpg)
The “Advanced” tier ($100/month) certainly seems like it could potentially be better than using Amazon. However, this tier only offers 500GB of storage. If you look up above at the Amazon pricing breakdown, you’ll notice that I’ve already used 466GB of storage for just that one experiment! Additionally, the 1000 CPU hours seems great, but remember, this is likely divided by the number of CPUs that you end up using. The Amazon EC2 instance was running eight cores. If I were to run a similar set up on InsideDNA, that would amount to 125 CPU hours per core. Again, looking up above, we see that I ran the EC2 instance for 374 hours! That means the “Advanced” tier on InsideDNA wouldn’t be enough to get our jobs done.
Anyway, in the grand scheme of things, using an Amazon EC2 instance periodically as we need it throughout the year isn’t terrible. However, if we start using the University of Washington Hyak computing cluster we may be able to avoid spending on EC2 and be able to have similar time savings (compared to using the lab computers). Need to get cracking on that…