Well, I tackled the storage space issue by expanding the EC2 Instance to have a 1000GB of storage space. Now that that’s no longer a concern, it turns out I’m running up against processing/memory limits!
I’m running the EC2 c4.2xlarge (Ubuntu 14.04 LTS, 8 vCPUs, 16 GiB RAM) instance.
I’m trying to run two programs simultaneously: PyRad and Stacks (specifically, the ustacks “sub” program).
PyRad keeps crashing with some memory error stuff (see embedded Jupyter Notebook at the end of this post).
Used the following Bash program to visualize what’s happening with the EC2 Instance resources (i.e. processors and RAM utilization):
<code>htop</code>
Downloaded/installed to EC2 Instance using:
<code>sudo apt-get install htop</code>
I see why PyRad is dying. Here are two screen captures that show what resources are being used (click to see detail):
(http://eagle.fish.washington.edu/Arabidopsis/20160718_ec2_ustacks_cpus.png)
(http://eagle.fish.washington.edu/Arabidopsis/20160718_ec2_ustacks_mem.png)
The top image shows that ustacks is using 100% of all eight CPUs!
The second image shows when ustacks is finishing with one of the files it’s processing, it uses all of the memory (16GBs)!
So, I will have to wait until ustacks is finished running before being able to continue with PyRad.
If I want to be able to run these simultaneously, I can (using either of these options still requires me to wait until ustacks completes in order to manipulate the current EC2 instance to accommodate either of the two following options):
Increase the computing resources of this EC2 Instance
Create an additional EC2 Instance and run PyRad on one and Stacks programs on the other.
Here’s the Jupyter Notebook with the PyRad errors (see “Step 3: Clustering” section):
https://github.com/sr320/LabDocs/blob/master/jupyter_nbs/sam/20160715_ec2_oly_gbs_pyrad.ipynb (GitHub)