Apptainer: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
Line 74: Line 74:
GPU 0: NVIDIA RTX A4000 (UUID: GPU-4a0a5644-9fc8-84b4-5d22-65d45ca36506)
GPU 0: NVIDIA RTX A4000 (UUID: GPU-4a0a5644-9fc8-84b4-5d22-65d45ca36506)
</pre>
</pre>
</pre>
<pre>
$ apptainer run --nv /fs/nexus-containers/pytorch/pytorch_1.13.0+cu117.sif
$ apptainer run --nv /fs/nexus-containers/pytorch/pytorch_1.13.0+cu117.sif
99 984.5538940429688
99 984.5538940429688
Line 105: Line 105:
</pre>
</pre>


You can now write/run your own pytorch python code interactively within the container or just make a python script that you can call directly from the apptainer exec command for batch processing.  
You can now write/run your own pytorch python code interactively within the container or just make a python script that you can call directly from the apptainer exec command for batch processing.
 
==Docker Workflow Example==
==Docker Workflow Example==
We have a [https://gitlab.umiacs.umd.edu/derek/pytorch_docker pytorch_docker] example workflow using our [[GitLab]] as a Docker registry.  You can clone the repository and further customize this to your needs. The workflow is:
We have a [https://gitlab.umiacs.umd.edu/derek/pytorch_docker pytorch_docker] example workflow using our [[GitLab]] as a Docker registry.  You can clone the repository and further customize this to your needs. The workflow is:

Revision as of 16:30, 1 November 2022

Singularity was rebranded as Apptainer. You should still be able to run commands on the system with singularity however should should start migrating to using the apptainer command.

Apptainer is a container platform that doesn't elevate the privileges of a user running the container. This is important as UMIACS runs many multi-tenant hosts and doesn't provide administrative control to users on them.

You can find out what the current version is that we provide by running the apptainer --version command. If this instead says apptainer: command not found please contact staff and we will ensure that the software is available on the host you are looking for it on.

# apptainer --version
apptainer version 1.1.0-1.el7

Apptainer can run a variety of images including its own format and Docker images. To create images, you need to have administrative rights. Therefore, you will need to do this on a host that you have administrative access to (laptop or personal desktop) rather than a UMIACS-supported host.

If you are going to pull large images, you may run out of space in your home directory. We suggest you run the following commands to setup a alternate cache directory. We are using /scratch0 but you can substitute any large enough network scratch or project directory you would like.

export WORKDIR=/scratch0/$USER
export APPTAINER_CACHEDIR=${WORKDIR}/.cache
mkdir -p $APPTAINER_CACHEDIR

We do suggest you pull images down into an intermediate file (SIF file) as you then do not have to worry about re-caching the image.

$ apptainer pull cuda10.2.sif docker://nvidia/cuda:10.2-devel
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob d5d706ce7b29 done
Copying blob b4dc78aeafca done
Copying blob 24a22c1b7260 done
Copying blob 8dea37be3176 done
Copying blob 25fa05cd42bd done
Copying blob a57130ec8de1 done
Copying blob 880a66924cf5 done
Copying config db554d658b done
Writing manifest to image destination
Storing signatures
2022/10/14 10:31:17  info unpack layer: sha256:25fa05cd42bd8fabb25d2a6f3f8c9f7ab34637903d00fd2ed1c1d0fa980427dd
2022/10/14 10:31:19  info unpack layer: sha256:24a22c1b72605a4dbcec13b743ef60a6cbb43185fe46fd8a35941f9af7c11153
2022/10/14 10:31:19  info unpack layer: sha256:8dea37be3176a88fae41c265562d5fb438d9281c356dcb4edeaa51451dbdfdb2
2022/10/14 10:31:20  info unpack layer: sha256:b4dc78aeafca6321025300e9d3050c5ba3fb2ac743ae547c6e1efa3f9284ce0b
2022/10/14 10:31:20  info unpack layer: sha256:a57130ec8de1e44163e965620d5aed2abe6cddf48b48272964bfd8bca101df38
2022/10/14 10:31:20  info unpack layer: sha256:d5d706ce7b293ffb369d3bf0e3f58f959977903b82eb26433fe58645f79b778b
2022/10/14 10:31:49  info unpack layer: sha256:880a66924cf5e11df601a4f531f3741c6867a3e05238bc9b7cebb2a68d479204
INFO:    Creating SIF file...
$ apptainer inspect cuda10.2.sif
maintainer: NVIDIA CORPORATION <cudatools@nvidia.com>
org.label-schema.build-arch: amd64
org.label-schema.build-date: Friday_14_October_2022_10:32:42_EDT
org.label-schema.schema-version: 1.0
org.label-schema.usage.apptainer.version: 1.1.0-1.el7
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: nvidia/cuda:10.2-devel

Now you can run the local image with the run command or start a shell with the shell command. Please note that if you are in an environment with GPUs and you want to access them inside the container you need to specify the --nv flag.

$ apptainer run --nv cuda10.2.sif nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-8e040d17-402e-cc86-4e83-eb2b1d501f1e)
GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-d681a21a-8cdd-e624-6bf8-5b0234584ba2)

Nexus Containers

In our Nexus environment we have some example containers based on our pytorch_docker project. These can be found in /fs/nexus-containers/pytorch.

You can just run one of the example images by doing the following (you should have already allocated a interactive job with a GPU in Nexus). It will use the default script found at /srv/tensor.py within the image.

$ hostname && nvidia-smi -L
tron38.umiacs.umd.edu
GPU 0: NVIDIA RTX A4000 (UUID: GPU-4a0a5644-9fc8-84b4-5d22-65d45ca36506)
$ apptainer run --nv /fs/nexus-containers/pytorch/pytorch_1.13.0+cu117.sif
99 984.5538940429688
199 654.1710815429688
299 435.662353515625
399 291.1429138183594
499 195.5575714111328
599 132.3363037109375
699 90.5206069946289
799 62.86213684082031
899 44.56754684448242
999 32.466392517089844
1099 24.461835861206055
1199 19.166893005371094
1299 15.6642427444458
1399 13.347112655639648
1499 11.814264297485352
1599 10.800163269042969
1699 10.129261016845703
1799 9.685370445251465
1899 9.391674041748047
1999 9.19735336303711
Result: y = 0.0022362577728927135 + 0.837898313999176 x + -0.0003857926349155605 x^2 + -0.09065020829439163 x^3

To get data into the container you need to pass some bind mounts (your home directory is done by default). In this example we will exec an interactive session binding our Nexus scratch directory which allows us to specify the command we want to run inside the container.

apptainer exec --nv --bind /fs/nexus-scratch/derek:/fs/nexus-scratch/derek /fs/nexus-containers/pytorch/pytorch_1.13.0+cu117.sif bash

You can now write/run your own pytorch python code interactively within the container or just make a python script that you can call directly from the apptainer exec command for batch processing.

Docker Workflow Example

We have a pytorch_docker example workflow using our GitLab as a Docker registry. You can clone the repository and further customize this to your needs. The workflow is:

  1. Run Docker on a laptop or personal desktop on to create the image.
  2. Tag the image and and push it to your repository (this can be any docker registry)
  3. Pull the image down onto one of our workstations/clusters and run it with your data.
$ apptainer pull pytorch_docker.sif docker://registry.umiacs.umd.edu/derek/pytorch_docker
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 85386706b020 done
...
2022/10/14 10:58:36  info unpack layer: sha256:b6f46848806c8750a68edc4463bf146ed6c3c4af18f5d3f23281dcdfb1c65055
2022/10/14 10:58:43  info unpack layer: sha256:44845dc671f759820baac0376198141ca683f554bb16a177a3cfe262c9e368ff
INFO:    Creating SIF file...
$ apptainer exec --nv pytorch_docker.sif python3 -c 'from __future__ import print_function; import torch; print(torch.cuda.current_device()); x = torch.rand(5, 3); print(x)'
0
tensor([[0.3273, 0.7174, 0.3587],
        [0.2250, 0.3896, 0.4136],
        [0.3626, 0.0383, 0.6274],
        [0.6241, 0.8079, 0.2950],
        [0.0804, 0.9705, 0.0030]])