At some point, our HPC nodes on Mox will be retired. When that happens, we’ll likely purchase new nodes on the newest UW cluster, Klone. Additionally, the coenv
nodes are no longer available on Mox. One was decommissioned and one was “migrated” to Klone. The primary issue at hand is that the base operating system for Klone appears to be very, very basic. I’d previously attempted to build/install some bioinformatics software on Klone, but could not due to a variety of missing libraries; these libraries are available by default on Mox… Part of this isn’t surprising, as UW IT has been making a concerted effort to get users to switch to containerization - specifically using Apptainer (formerly Singularity) containers.
There are significant benefits to containerization (chiefly, reproducibility; containers can be transferred to other computers and run without the end user needing to modify/change anything), but the learning curve for how to create and use containers can be a bit tricky for those (i.e. members of the Roberts Lab) who haven’t previously worked with them. Combine this with the fact that these containers need to run in using the SLURM workload management system in place on Mox/Klone, the learning process can seem a bit convoluted.
Anyway, in an effort to be proactive, I’ve begun a concerted effort into developing and using containers on Klone (although, I had put in previous effort about 6yrs ago to try to get the lab to start using Docker, which wasn’t really adopted by anyone; including myself). Admittedly, part of the drive is a selfish one, as I really want to be able to use the coenv
node that’s there…
I’ve established a GitHub repo subdirectory for storing Apptainer definition files. Although these won’t get modified with any frequency, since once the containers are built people will just use those, this repo will provide a resource for people to refer to. Additionally, I’ll be adding a section about this to the Roberts Lab Handbook. Another decent resource regarding building/using containers is the UW Research Computing website. It provides a decent overview, but is still a bit confusing. Below is a quick overview, specific to the Roberts Lab’s of how to build and use containers.
First, we’ll create a “base” image. All subsequent containers will be built using this base image. This helps to keep the subsequent definition files for other containers much shorter, cleaner, and easier to read. The ubuntu-22.04-base.def
(GitHub) builds a basic Ubuntu 22.04 image, as well as adds the necessary libraries for building/installing other software later on. It also creates the /gscratch/srlab/programs
directory. This is to help with the migration from using Mox, to Klone; Roberts Lab users will know where to look for software with the containers, if they need to find it. Although I’m documenting how to build containers, most of the container building will be one-off jobs. After a container is built, it can be used by anyone. There likely won’t be many (any) reasons to rebuild a container once it’s been built.
There is one shortcoming to our current usage - we don’t really have a way to link changes to definition files to container images. E.g. If someone updates a definition file, they may forget to rebuild the image with the updated definition file. Additionally, if they do update the container image, how does one indicate that it is different than it used to be? There are ways to do this (using Docker Hub and/or some other container registry, combined with using GitHub automated builds, but this is likely beyond our technical expertise, as well as a bit of overkill for our use cases).
Anyway, here’s how to get started. First, build the base image:
Build base container
apptainer build \
\
--sandbox \
--fakeroot \
/tmp/ubuntu-22.04-base.sandbox ./ubuntu-22.04-base.def && apptainer build \
\
/tmp/ubuntu-22.04-base.sif /tmp/ubuntu-22.04-base.sandbox && mv /tmp/ubuntu-22.04-base.sif .
NOTES:
The
--sandbox
option is necessary to allow the persistence of the/gscratch/srlab/programs
directory in the container to all subsequent builds that utilizeubuntu-22.04-base.sif
as a base. Otherwise, the directory would need to be created in every subsequent image built off of this one.The
/gscratch/srlab/programs
directory is also added to the system$PATH
in the container. This means that any programs added there by containers built on this base image will be accessible without the need to specify the full path to the program. E.g. To callbedtools
the user will only need to typebedtools
, not/gscratch/srlab/programs/bedtools
.Building the image in
/tmp/
and then moving to desired directory is recommended by UW Research Computing.
Build bedtools
container
For a basic example, we’ll create a new image, based on the base image above, which installs an additional piece of software: bedtools
. Here’re the contents of the bedtools-2.31.0.def
(GitHub):
Bootstrap: localimage
From: ubuntu-22.04-base.sif
%post
echo "$PWD"
ls -l
ls -l /gscratch
cd /gscratch/srlab/programs
wget https://github.com/arq5x/bedtools2/releases/download/v2.31.0/bedtools.static
mv bedtools.static bedtools
chmod a+x bedtools
%labels
Author Sam White
Version v0.0.1
%help
This is a definition file for a bedtools-v2.31.0 image.
It’s much, much shorter than the ubuntu-22.04-base.def
(GitHub) because it’s using all of the stuff that’s already been installed in ubuntu-22.04-base.def
(GitHub). There’re some extraneous lines that are leftover from my initial testing (namely the echo
, ls
and ls -l
commands) which I’ll get rid of later, but it should be relatively easy to see what’s happening:
Change to the
/gscratch/srlab/programs
directory.Download the
bedtools
program.Use the
mv
command to rename it to justbedtools
(instead ofbedtools.static
).Make the program executable using
chmod
.
To actually build the bedtools
image, we run the following.
The bedtools-2.31.0.def
definition file has to be in same directory as the ubuntu-22.04-base.sif
.
# Load the apptainer module
module load apptainer
apptainer build \
\
/tmp/bedtools-2.31.0.sif ./bedtools-2.31.0.def && mv /tmp/bedtools-2.31.0.sif .
Using SLURM with an image
First, we’ll want to create a (bash
) script (e.g. bedtools.sh
) with all of our commands. We’ll also need to ensure the script is executable (chmod a+x bedtools.sh
). Here’s an example of a version of that script which will print (echo
) a statement and then pull up the help menu for bedtools
:
$cat bedtools.sh
#!/bin/bash
echo "Going to try to execute bedtools..."
bedtools -h
Next, we’ll need our usual SLURM script with all the header stuff. I’m going to call the script container_test.sh
.
$cat container_test.sh
#!/bin/bash
## Job Name
#SBATCH --job-name=container_test
## Allocation Definition
#SBATCH --account=coenv
#SBATCH --partition=compute
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=01-00:00:00
## Memory per node
#SBATCH --mem=100G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/mmfs1/home/samwhite/container_test
# Load Apptainer module
module load apptainer
# Execute the bedtools-2.31.0.sif container
# Run the bedtools.sh script
apptainer exec \
$PWD \
--home \
--bind /mmfs1/home/ \
--bind /mmfs1/gscratch/ \
~/apptainers/bedtools-2.31.0.sif ~/container_test/bedtools.sh
In this SLURM batch script, we have our usual header stuff defining computing accounts/partitions, memory, runtime, etc. Then, we need to load the apptainer
module (provided by UW) so we can actually run the apptainer
command.
The subsequent apptainer exec
command “binds” (i.e. connects) directories on Klone to the specified location within the bedtools-2.31.0.sif
container. Since there’s only a single entry after the --bind
and --home
arguments, it’s connecting that directory path on Klone to the exact same location within the bedtools-2.31.0.sif
container. If we wanted to bind a location on Klone to a different location within the container, we could make it look something like --bind /mmfs1/home/:/container/working_dir/
.
Then, we tell apptainer
which image to use (bedtools-2.31.0.sif
), followed by an argument. In our case, we want the bedtools-2.31.0.sif
container to run our bedtools.sh
script.
Finally, to actually submit this to the SLURM scheduler we run the following command:
sbatch container_test.sh