Apptainer: Difference between revisions

From UMIACS
Jump to navigation Jump to search
 
Line 128: Line 128:
We have a [https://gitlab.umiacs.umd.edu/derek/pytorch_docker pytorch_docker] example workflow using our [[GitLab]] as a Docker registry.  You can clone the repository and further customize this to your needs. The workflow is:
We have a [https://gitlab.umiacs.umd.edu/derek/pytorch_docker pytorch_docker] example workflow using our [[GitLab]] as a Docker registry.  You can clone the repository and further customize this to your needs. The workflow is:


# Run Docker on a laptop or personal desktop on to create the image.
# Run Docker on a laptop or personal desktop on to create the image, or use [[Podman]] on a UMIACS-supported system.
# Tag the image and and push it to your repository (this can be any docker registry)
# Tag the image and and push it to your repository (this can be any docker registry)
# Pull the image down onto one of our workstations/clusters and run it with your data.  
# Pull the image down onto one of our workstations/clusters and run it with your data.  

Latest revision as of 17:51, 13 August 2024

Apptainer is a container platform that doesn't elevate the privileges of a user running the container. This is important as UMIACS runs many multi-tenant hosts (such as Nexus) and doesn't provide administrative control to users on them. While Docker is popular, the most typical setups require a daemon that has administrative level privileges that makes it not tenable.

Apptainer was previously branded as Singularity. You should still be able to run commands on the system with singularity, however you should start migrating to using the apptainer command.

Overview

You can find out what the current version is that we provide by running the apptainer --version command. If this instead says apptainer: command not found and you are using a UMIACS-supported host, please contact staff and we will ensure that the software is available on the host you are looking for it on.

# apptainer --version
apptainer version 1.2.5-1.el8

Apptainer can run a variety of images including its own format and Docker images. To create images from definition files, you need to have administrative rights. You will need to either use Podman to accomplish this on UMIACS-supported hosts, or alternatively do this on a host that you have full administrative access to (laptop or personal desktop) rather than a UMIACS-supported host.

If you are going to pull large images, you may run out of space in your home directory. We suggest you run the following commands to setup alternate cache and tmp directories. We are using /scratch0 but you can substitute any large enough local scratch directory, network scratch directory, or project directory you would like.

export WORKDIR=/scratch0/$USER
export APPTAINER_CACHEDIR=${WORKDIR}/.cache
export APPTAINER_TMPDIR=${WORKDIR}/.tmp
mkdir -p $APPTAINER_CACHEDIR
mkdir -p $APPTAINER_TMPDIR


We do suggest you pull images down into an intermediate file (SIF file) as you then do not have to worry about re-caching the image.

$ apptainer pull cuda12.2.2.sif docker://nvidia/cuda:12.2.2-base-ubi8
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob d5d706ce7b29 done
Copying blob b4dc78aeafca done
Copying blob 24a22c1b7260 done
Copying blob 8dea37be3176 done
Copying blob 25fa05cd42bd done
Copying blob a57130ec8de1 done
Copying blob 880a66924cf5 done
Copying config db554d658b done
Writing manifest to image destination
Storing signatures
2022/10/14 10:31:17  info unpack layer: sha256:25fa05cd42bd8fabb25d2a6f3f8c9f7ab34637903d00fd2ed1c1d0fa980427dd
2022/10/14 10:31:19  info unpack layer: sha256:24a22c1b72605a4dbcec13b743ef60a6cbb43185fe46fd8a35941f9af7c11153
2022/10/14 10:31:19  info unpack layer: sha256:8dea37be3176a88fae41c265562d5fb438d9281c356dcb4edeaa51451dbdfdb2
2022/10/14 10:31:20  info unpack layer: sha256:b4dc78aeafca6321025300e9d3050c5ba3fb2ac743ae547c6e1efa3f9284ce0b
2022/10/14 10:31:20  info unpack layer: sha256:a57130ec8de1e44163e965620d5aed2abe6cddf48b48272964bfd8bca101df38
2022/10/14 10:31:20  info unpack layer: sha256:d5d706ce7b293ffb369d3bf0e3f58f959977903b82eb26433fe58645f79b778b
2022/10/14 10:31:49  info unpack layer: sha256:880a66924cf5e11df601a4f531f3741c6867a3e05238bc9b7cebb2a68d479204
INFO:    Creating SIF file...
$ apptainer inspect cuda12.2.2.sif
...
maintainer: NVIDIA CORPORATION <sw-cuda-installer@nvidia.com>
name: ubi8
org.label-schema.build-arch: amd64
org.label-schema.build-date: Wednesday_24_January_2024_13:53:0_EST
org.label-schema.schema-version: 1.0
org.label-schema.usage.apptainer.version: 1.2.5-1.el8
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: nvidia/cuda:12.2.2-base-ubi8
...

Now you can run the local image with the run command or start a shell with the shell command.

  • Please note that if you are in an environment with GPUs and you want to access them inside the container you need to specify the --nv flag. Nvidia has a very specific driver and libraries that are required to run CUDA programs, so this is to ensure that all appropriate devices are created inside the container and that these libraries are made available in the container .
$ apptainer run --nv cuda12.2.2.sif nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-8e040d17-402e-cc86-4e83-eb2b1d501f1e)
GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-d681a21a-8cdd-e624-6bf8-5b0234584ba2)

Nexus Containers

In our Nexus environment we have some example containers based on our pytorch_docker project. These can be found in /fs/nexus-containers/pytorch.

You can just run one of the example images by doing the following (you should have already allocated a interactive job with a GPU in Nexus). It will use the default script found at /srv/tensor.py within the image.

$ hostname && nvidia-smi -L
tron38.umiacs.umd.edu
GPU 0: NVIDIA RTX A4000 (UUID: GPU-4a0a5644-9fc8-84b4-5d22-65d45ca36506)
$ apptainer run --nv /fs/nexus-containers/pytorch/pytorch_1.13.0+cu117.sif
99 984.5538940429688
199 654.1710815429688
299 435.662353515625
399 291.1429138183594
499 195.5575714111328
599 132.3363037109375
699 90.5206069946289
799 62.86213684082031
899 44.56754684448242
999 32.466392517089844
1099 24.461835861206055
1199 19.166893005371094
1299 15.6642427444458
1399 13.347112655639648
1499 11.814264297485352
1599 10.800163269042969
1699 10.129261016845703
1799 9.685370445251465
1899 9.391674041748047
1999 9.19735336303711
Result: y = 0.0022362577728927135 + 0.837898313999176 x + -0.0003857926349155605 x^2 + -0.09065020829439163 x^3

Bind Mounts

To get data into the container you need to pass some bind mounts. Apptainer containers will not automatically mount data from the outside operating system other than your home directory. Users need to manually bind mounts for other file paths.

--bind /fs/nexus-scratch/<USERNAME>/<PROJECTNAME>:/mnt

In this example, we will exec an interactive session with GPUs and binding our Nexus scratch directory which allows us to specify the command we want to run inside the container.

apptainer exec --nv --bind /fs/nexus-scratch/username:/fs/nexus-scratch/username /fs/nexus-containers/pytorch/pytorch_1.13.0+cu117.sif bash

You can now write/run your own pytorch python code interactively within the container or just make a python script that you can call directly from the apptainer exec command for batch processing.

Shared Containers

Portable images called Singularity Image Format or .sif files can be copied and shared. Nexus maintains some shared containers in /fs/nexus-containers. These are arranged by the application(s) that are installed.

Docker Workflow Example

We have a pytorch_docker example workflow using our GitLab as a Docker registry. You can clone the repository and further customize this to your needs. The workflow is:

  1. Run Docker on a laptop or personal desktop on to create the image, or use Podman on a UMIACS-supported system.
  2. Tag the image and and push it to your repository (this can be any docker registry)
  3. Pull the image down onto one of our workstations/clusters and run it with your data.
$ apptainer pull pytorch_docker.sif docker://registry.umiacs.umd.edu/derek/pytorch_docker
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 85386706b020 done
...
2022/10/14 10:58:36  info unpack layer: sha256:b6f46848806c8750a68edc4463bf146ed6c3c4af18f5d3f23281dcdfb1c65055
2022/10/14 10:58:43  info unpack layer: sha256:44845dc671f759820baac0376198141ca683f554bb16a177a3cfe262c9e368ff
INFO:    Creating SIF file...
$ apptainer exec --nv pytorch_docker.sif python3 -c 'from __future__ import print_function; import torch; print(torch.cuda.current_device()); x = torch.rand(5, 3); print(x)'
0
tensor([[0.3273, 0.7174, 0.3587],
        [0.2250, 0.3896, 0.4136],
        [0.3626, 0.0383, 0.6274],
        [0.6241, 0.8079, 0.2950],
        [0.0804, 0.9705, 0.0030]])