Apptainer
Apptainer is a container platform that doesn't elevate the privileges of a user running the container. This is important as UMIACS runs many multi-tenant hosts (such as Nexus) and doesn't provide administrative control to users on them.
Apptainer was previously branded as Singularity. You should still be able to run commands on the system with singularity
, however you should start migrating to using the apptainer
command.
Overview
You can find out what the current version is that we provide by running the apptainer --version command. If this instead says apptainer: command not found
please contact staff and we will ensure that the software is available on the host you are looking for it on.
# apptainer --version apptainer version 1.1.0-1.el7
Apptainer can run a variety of images including its own format and Docker images. To create images from definition files, you need to have administrative rights. You will need to either use Podman to accomplish this on UMIACS-supported hosts, or alternatively do this on a host that you have full administrative access to (laptop or personal desktop) rather than a UMIACS-supported host.
If you are going to pull large images, you may run out of space in your home directory. We suggest you run the following commands to setup alternate cache and tmp directories. We are using /scratch0
but you can substitute any large enough network scratch or project directory you would like.
export WORKDIR=/scratch0/$USER export APPTAINER_CACHEDIR=${WORKDIR}/.cache export APPTAINER_TMPDIR=${WORKDIR}/.tmp mkdir -p $APPTAINER_CACHEDIR mkdir -p $APPTAINER_TMPDIR
We do suggest you pull images down into an intermediate file (SIF file) as you then do not have to worry about re-caching the image.
$ apptainer pull cuda10.2.sif docker://nvidia/cuda:10.2-devel INFO: Converting OCI blobs to SIF format INFO: Starting build... Getting image source signatures Copying blob d5d706ce7b29 done Copying blob b4dc78aeafca done Copying blob 24a22c1b7260 done Copying blob 8dea37be3176 done Copying blob 25fa05cd42bd done Copying blob a57130ec8de1 done Copying blob 880a66924cf5 done Copying config db554d658b done Writing manifest to image destination Storing signatures 2022/10/14 10:31:17 info unpack layer: sha256:25fa05cd42bd8fabb25d2a6f3f8c9f7ab34637903d00fd2ed1c1d0fa980427dd 2022/10/14 10:31:19 info unpack layer: sha256:24a22c1b72605a4dbcec13b743ef60a6cbb43185fe46fd8a35941f9af7c11153 2022/10/14 10:31:19 info unpack layer: sha256:8dea37be3176a88fae41c265562d5fb438d9281c356dcb4edeaa51451dbdfdb2 2022/10/14 10:31:20 info unpack layer: sha256:b4dc78aeafca6321025300e9d3050c5ba3fb2ac743ae547c6e1efa3f9284ce0b 2022/10/14 10:31:20 info unpack layer: sha256:a57130ec8de1e44163e965620d5aed2abe6cddf48b48272964bfd8bca101df38 2022/10/14 10:31:20 info unpack layer: sha256:d5d706ce7b293ffb369d3bf0e3f58f959977903b82eb26433fe58645f79b778b 2022/10/14 10:31:49 info unpack layer: sha256:880a66924cf5e11df601a4f531f3741c6867a3e05238bc9b7cebb2a68d479204 INFO: Creating SIF file...
$ apptainer inspect cuda10.2.sif maintainer: NVIDIA CORPORATION <cudatools@nvidia.com> org.label-schema.build-arch: amd64 org.label-schema.build-date: Friday_14_October_2022_10:32:42_EDT org.label-schema.schema-version: 1.0 org.label-schema.usage.apptainer.version: 1.1.0-1.el7 org.label-schema.usage.singularity.deffile.bootstrap: docker org.label-schema.usage.singularity.deffile.from: nvidia/cuda:10.2-devel
Now you can run the local image with the run command or start a shell with the shell command. Please note that if you are in an environment with GPUs and you want to access them inside the container you need to specify the --nv flag.
$ apptainer run --nv cuda10.2.sif nvidia-smi -L GPU 0: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-8e040d17-402e-cc86-4e83-eb2b1d501f1e) GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-d681a21a-8cdd-e624-6bf8-5b0234584ba2)
Nexus Containers
In our Nexus environment we have some example containers based on our pytorch_docker project. These can be found in /fs/nexus-containers/pytorch
.
You can just run one of the example images by doing the following (you should have already allocated a interactive job with a GPU in Nexus). It will use the default script found at /srv/tensor.py
within the image.
$ hostname && nvidia-smi -L tron38.umiacs.umd.edu GPU 0: NVIDIA RTX A4000 (UUID: GPU-4a0a5644-9fc8-84b4-5d22-65d45ca36506)
$ apptainer run --nv /fs/nexus-containers/pytorch/pytorch_1.13.0+cu117.sif 99 984.5538940429688 199 654.1710815429688 299 435.662353515625 399 291.1429138183594 499 195.5575714111328 599 132.3363037109375 699 90.5206069946289 799 62.86213684082031 899 44.56754684448242 999 32.466392517089844 1099 24.461835861206055 1199 19.166893005371094 1299 15.6642427444458 1399 13.347112655639648 1499 11.814264297485352 1599 10.800163269042969 1699 10.129261016845703 1799 9.685370445251465 1899 9.391674041748047 1999 9.19735336303711 Result: y = 0.0022362577728927135 + 0.837898313999176 x + -0.0003857926349155605 x^2 + -0.09065020829439163 x^3
To get data into the container you need to pass some bind mounts (your home directory is done by default). In this example we will exec an interactive session binding our Nexus scratch directory which allows us to specify the command we want to run inside the container.
apptainer exec --nv --bind /fs/nexus-scratch/derek:/fs/nexus-scratch/derek /fs/nexus-containers/pytorch/pytorch_1.13.0+cu117.sif bash
You can now write/run your own pytorch python code interactively within the container or just make a python script that you can call directly from the apptainer exec command for batch processing.
Docker Workflow Example
We have a pytorch_docker example workflow using our GitLab as a Docker registry. You can clone the repository and further customize this to your needs. The workflow is:
- Run Docker on a laptop or personal desktop on to create the image.
- Tag the image and and push it to your repository (this can be any docker registry)
- Pull the image down onto one of our workstations/clusters and run it with your data.
$ apptainer pull pytorch_docker.sif docker://registry.umiacs.umd.edu/derek/pytorch_docker INFO: Converting OCI blobs to SIF format INFO: Starting build... Getting image source signatures Copying blob 85386706b020 done ... 2022/10/14 10:58:36 info unpack layer: sha256:b6f46848806c8750a68edc4463bf146ed6c3c4af18f5d3f23281dcdfb1c65055 2022/10/14 10:58:43 info unpack layer: sha256:44845dc671f759820baac0376198141ca683f554bb16a177a3cfe262c9e368ff INFO: Creating SIF file...
$ apptainer exec --nv pytorch_docker.sif python3 -c 'from __future__ import print_function; import torch; print(torch.cuda.current_device()); x = torch.rand(5, 3); print(x)' 0 tensor([[0.3273, 0.7174, 0.3587], [0.2250, 0.3896, 0.4136], [0.3626, 0.0383, 0.6274], [0.6241, 0.8079, 0.2950], [0.0804, 0.9705, 0.0030]])