Apptainer: Difference between revisions

From UMIACS
Jump to navigation Jump to search
No edit summary
 
(33 intermediate revisions by 3 users not shown)
Line 1: Line 1:
'''Note''' Singularity was rebranded as ApptainerYou should still be able to run commands on the system with singularity however should should start migrating to using the apptainer commandAll other functionality is the same other than the command name
[https://apptainer.org Apptainer] is a container platform that doesn't elevate the privileges of a user running the containerThis is important as UMIACS runs many multi-tenant hosts (such as [[Nexus]]) and doesn't provide administrative control to users on themWhile [https://www.docker.com Docker] is popular, the most typical setups require a daemon that has administrative level privileges that makes it not tenable.


[https://apptainer.org Apptainer] is a container platform that doesn't elevate the privileges of a user running the containerThis is important as UMIACS runs many multi-tenant hosts and doesn't provide administrative control to users on them.
'''Apptainer was previously branded as SingularityYou should still be able to run commands on the system with <code>singularity</code>, however you should start migrating to using the <code>apptainer</code> command.'''


You can find out what the current version is that we provide by running the '''apptainer --version''' command.  If this instead says <code>apptainer: command not found</code> please contact staff and we will ensure that the software is available on the host you are looking for it on.
==Overview==
You can find out what the current version is that we provide by running the <code>apptainer --version</code> command.  If this instead says <code>apptainer: command not found</code> and you are using a UMIACS-supported host, please [[HelpDesk | contact staff]] and we will ensure that the software is available on the host you are looking for it on.


<pre>
<pre>
# apptainer --version
# apptainer --version
apptainer version 1.1.0-1.el7
apptainer version 1.2.5-1.el8
</pre>
</pre>


Apptainer can run a variety of images including its own format and [https://apptainer.org/docs/user/1.1/docker_and_oci.html Docker images].  To create images, you need to have administrative rights. Therefore, you will need to do this on a host that you have administrative access to (laptop or personal desktop) rather than a UMIACS-supported host.
Apptainer can run a variety of images including its own format and [https://apptainer.org/docs/user/main/docker_and_oci.html Docker images].  To create images from definition files, you need to have administrative rights. You will need to either use [[Podman]] to accomplish this on UMIACS-supported hosts, or alternatively do this on a host that you have full administrative access to (laptop or personal desktop) rather than a UMIACS-supported host.


If you are going to pull large images, you may run out of space in your home directory. We suggest you run the following commands to setup a alternate cache directory.
If you are going to pull large images, you may run out of space in your home directory. We suggest you run the following commands to setup alternate cache and tmp directories.  We are using <code>/scratch0</code> but you can substitute any large enough local scratch directory, network scratch directory, or project directory you would like.
<pre>
<pre>
export WORKDIR=/scratch0/username
export WORKDIR=/scratch0/$USER
export APPTAINER_CACHEDIR=${WORKDIR}/.cache
export APPTAINER_CACHEDIR=${WORKDIR}/.cache
export APPTAINER_TMPDIR=${WORKDIR}/.tmp
mkdir -p $APPTAINER_CACHEDIR
mkdir -p $APPTAINER_CACHEDIR
mkdir -p $APPTAINER_TMPDIR
</pre>
</pre>


We do suggest you pull images down into an intermediate file ('''SIF''' file) as you then do not have to worry about re-caching the image.
We do suggest you pull images down into an intermediate file ('''SIF''' file) as you then do not have to worry about re-caching the image.
<pre>
<pre>
$ apptainer pull cuda10.2.sif docker://nvidia/cuda:10.2-devel
$ apptainer pull cuda12.2.2.sif docker://nvidia/cuda:12.2.2-base-ubi8
INFO:    Converting OCI blobs to SIF format
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
INFO:    Starting build...
Line 46: Line 50:


<pre>
<pre>
$ apptainer inspect cuda10.2.sif
$ apptainer inspect cuda12.2.2.sif
maintainer: NVIDIA CORPORATION <cudatools@nvidia.com>
...
maintainer: NVIDIA CORPORATION <sw-cuda-installer@nvidia.com>
name: ubi8
org.label-schema.build-arch: amd64
org.label-schema.build-arch: amd64
org.label-schema.build-date: Friday_14_October_2022_10:32:42_EDT
org.label-schema.build-date: Wednesday_24_January_2024_13:53:0_EST
org.label-schema.schema-version: 1.0
org.label-schema.schema-version: 1.0
org.label-schema.usage.apptainer.version: 1.1.0-1.el7
org.label-schema.usage.apptainer.version: 1.2.5-1.el8
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: nvidia/cuda:10.2-devel
org.label-schema.usage.singularity.deffile.from: nvidia/cuda:12.2.2-base-ubi8
...
</pre>
 
Now you can run the local image with the '''run''' command or start a shell with the '''shell''' command. 
* Please note that if you are in an environment with GPUs and you want to access them inside the container you need to specify the '''--nv''' flag. Nvidia has a very specific driver and libraries that are required to run CUDA programs, so this is to ensure that all appropriate devices are created inside the container and that these libraries are made available in the container .
 
<pre>
$ apptainer run --nv cuda12.2.2.sif nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-8e040d17-402e-cc86-4e83-eb2b1d501f1e)
GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-d681a21a-8cdd-e624-6bf8-5b0234584ba2)
</pre>
 
==Nexus Containers==
In our [[Nexus]] environment we have some example containers based on our [https://gitlab.umiacs.umd.edu/derek/pytorch_docker pytorch_docker] project.  These can be found in <code>/fs/nexus-containers/pytorch</code>.
 
You can just run one of the example images by doing the following (you should have already allocated a interactive job with a GPU in [[Nexus]]).  It will use the default [https://gitlab.umiacs.umd.edu/derek/pytorch_docker/-/blob/master/tensor.py script] found at <code>/srv/tensor.py</code> within the image.
 
<pre>
$ hostname && nvidia-smi -L
tron38.umiacs.umd.edu
GPU 0: NVIDIA RTX A4000 (UUID: GPU-4a0a5644-9fc8-84b4-5d22-65d45ca36506)
</pre>
<pre>
$ apptainer run --nv /fs/nexus-containers/pytorch/pytorch_1.13.0+cu117.sif
99 984.5538940429688
199 654.1710815429688
299 435.662353515625
399 291.1429138183594
499 195.5575714111328
599 132.3363037109375
699 90.5206069946289
799 62.86213684082031
899 44.56754684448242
999 32.466392517089844
1099 24.461835861206055
1199 19.166893005371094
1299 15.6642427444458
1399 13.347112655639648
1499 11.814264297485352
1599 10.800163269042969
1699 10.129261016845703
1799 9.685370445251465
1899 9.391674041748047
1999 9.19735336303711
Result: y = 0.0022362577728927135 + 0.837898313999176 x + -0.0003857926349155605 x^2 + -0.09065020829439163 x^3
</pre>
</pre>


Now you can run the local image with the '''run''' command or start a shell with the '''shell''' commandPlease note that if you are in an environment with GPUs and you want to access them inside the container you need to specify the '''--nv''' flag.
===Bind Mounts===
To get data into the container you need to pass some [https://apptainer.org/docs/user/main/bind_paths_and_mounts.html bind mounts].  Apptainer containers will not automatically mount data from the outside operating system other than your home directoryUsers need to manually bind mounts for other file paths.
 
<code>--bind /fs/nexus-scratch/<USERNAME>/<PROJECTNAME>:/mnt</code>
 
In this example, we will exec an interactive session with GPUs and binding our [[Nexus]] scratch directory which allows us to specify the command we want to run inside the container.


<pre>
<pre>
$ apptainer run --nv cuda10.2.sif nvidia-smi -L
apptainer exec --nv --bind /fs/nexus-scratch/username:/fs/nexus-scratch/username /fs/nexus-containers/pytorch/pytorch_1.13.0+cu117.sif bash
GPU 0: GeForce GTX 1080 Ti (UUID: GPU-9ee980c3-8746-08dd-8e14-82fbaf88367e)
</pre>
</pre>


==Example==
You can now write/run your own pytorch python code interactively within the container or just make a python script that you can call directly from the apptainer exec command for batch processing.
We have a [https://gitlab.umiacs.umd.edu/derek/gpudocker gpudocker] example workflow using our [[GitLab]] as a Docker registry.  You can clone the repository and further customize this to your needs. The workflow is:
 
# Run Docker on a laptop or personal desktop on to create the image.
<span id="Sif_anchor"></span>
# Tag the image and and push it to the repository.
 
===Shared Containers===
Portable images called '''Singularity Image Format''' or .sif files can be copied and shared.  Nexus maintains some shared containers in <code>/fs/nexus-containers</code>.  These are arranged by the application(s) that are installed.
 
==Docker Workflow Example==
We have a [https://gitlab.umiacs.umd.edu/derek/pytorch_docker pytorch_docker] example workflow using our [[GitLab]] as a Docker registry.  You can clone the repository and further customize this to your needs. The workflow is:
 
# Run Docker on a laptop or personal desktop on to create the image, or use [[Podman]] on a UMIACS-supported system.
# Tag the image and and push it to your repository (this can be any docker registry)
# Pull the image down onto one of our workstations/clusters and run it with your data.  
# Pull the image down onto one of our workstations/clusters and run it with your data.  


<pre>
<pre>
$ singularity pull gpudocker.sif docker://registry.umiacs.umd.edu/derek/gpudocker
$ apptainer pull pytorch_docker.sif docker://registry.umiacs.umd.edu/derek/pytorch_docker
INFO:    Converting OCI blobs to SIF format
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
INFO:    Starting build...
Getting image source signatures
Getting image source signatures
Copying blob sha256:7ddbc47eeb70dc7f08e410a6667948b87ff3883024eb41478b44ef9a81bf400c
Copying blob 85386706b020 done
25.45 MiB / 25.45 MiB [====================================================] 2s
...
Copying blob sha256:c1bbdc448b7263673926b8fe2e88491e5083a8b4b06ddfabf311f2fc5f27e2ff
2022/10/14 10:58:36 info unpack layer: sha256:b6f46848806c8750a68edc4463bf146ed6c3c4af18f5d3f23281dcdfb1c65055
34.53 KiB / 34.53 KiB [====================================================] 0s
2022/10/14 10:58:43 info unpack layer: sha256:44845dc671f759820baac0376198141ca683f554bb16a177a3cfe262c9e368ff
Copying blob sha256:8c3b70e3904492c753652606df4726430426f42ea56e06ea924d6fea7ae162a1
845 B / 845 B [============================================================] 0s
Copying blob sha256:45d437916d5781043432f2d72608049dcf74ddbd27daa01a25fa63c8f1b9adc4
162 B / 162 B [============================================================] 0s
Copying blob sha256:d8f1569ddae616589c5a2dabf668fadd250ee9d89253ef16f0cb0c8a9459b322
6.88 MiB / 6.88 MiB [======================================================] 0s
Copying blob sha256:85386706b02069c58ffaea9de66c360f9d59890e56f58485d05c1a532ca30db1
8.05 MiB / 8.05 MiB [======================================================] 1s
Copying blob sha256:ee9b457b77d047ff322858e2de025e266ff5908aec569560e77e2e4451fc23f4
184 B / 184 B [============================================================] 0s
Copying blob sha256:be4f3343ecd31ebf7ec8809f61b1d36c2c2f98fc4e63582401d9108575bc443a
656.83 MiB / 656.83 MiB [=================================================] 28s
Copying blob sha256:30b4effda4fdab95ec4eba8873f86e7574c2edddf4dc5df8212e3eda1545aafa
782.81 MiB / 782.81 MiB [=================================================] 37s
Copying blob sha256:b6f46848806c8750a68edc4463bf146ed6c3c4af18f5d3f23281dcdfb1c65055
100.58 MiB / 100.58 MiB [==================================================] 5s
Copying blob sha256:44845dc671f759820baac0376198141ca683f554bb16a177a3cfe262c9e368ff
1.47 GiB / 1.47 GiB [====================================================] 1m4s
Copying config sha256:1b2e5b7b99af9d797ef6fbd091a6a2c6a30e519e31a74f5e9cacb4c8c462d6ed
7.56 KiB / 7.56 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
2020/04/14 12:21:17  info unpack layer: sha256:7ddbc47eeb70dc7f08e410a6667948b87ff3883024eb41478b44ef9a81bf400c
2020/04/14 12:21:18  info unpack layer: sha256:c1bbdc448b7263673926b8fe2e88491e5083a8b4b06ddfabf311f2fc5f27e2ff
2020/04/14 12:21:18  info unpack layer: sha256:8c3b70e3904492c753652606df4726430426f42ea56e06ea924d6fea7ae162a1
2020/04/14 12:21:18  info unpack layer: sha256:45d437916d5781043432f2d72608049dcf74ddbd27daa01a25fa63c8f1b9adc4
2020/04/14 12:21:18  info unpack layer: sha256:d8f1569ddae616589c5a2dabf668fadd250ee9d89253ef16f0cb0c8a9459b322
2020/04/14 12:21:18  info unpack layer: sha256:85386706b02069c58ffaea9de66c360f9d59890e56f58485d05c1a532ca30db1
2020/04/14 12:21:18  info unpack layer: sha256:ee9b457b77d047ff322858e2de025e266ff5908aec569560e77e2e4451fc23f4
2020/04/14 12:21:18  info unpack layer: sha256:be4f3343ecd31ebf7ec8809f61b1d36c2c2f98fc4e63582401d9108575bc443a
2020/04/14 12:21:35  info unpack layer: sha256:30b4effda4fdab95ec4eba8873f86e7574c2edddf4dc5df8212e3eda1545aafa
2020/04/14 12:21:55 info unpack layer: sha256:b6f46848806c8750a68edc4463bf146ed6c3c4af18f5d3f23281dcdfb1c65055
2020/04/14 12:21:58 info unpack layer: sha256:44845dc671f759820baac0376198141ca683f554bb16a177a3cfe262c9e368ff
INFO:    Creating SIF file...
INFO:    Creating SIF file...
INFO:    Build complete: gpudocker.sif
</pre>
</pre>


<pre>
<pre>
$ singularity run --nv gpudocker.sif python3 -c 'from __future__ import print_function; import torch; print(torch.cuda.current_device()); x = torch.rand(5, 3); print(x)'
$ apptainer exec --nv pytorch_docker.sif python3 -c 'from __future__ import print_function; import torch; print(torch.cuda.current_device()); x = torch.rand(5, 3); print(x)'
0
0
tensor([[0.5299, 0.9827, 0.7858],
tensor([[0.3273, 0.7174, 0.3587],
         [0.2044, 0.6783, 0.2606],
         [0.2250, 0.3896, 0.4136],
         [0.0538, 0.4272, 0.9361],
         [0.3626, 0.0383, 0.6274],
         [0.1980, 0.2654, 0.4160],
         [0.6241, 0.8079, 0.2950],
         [0.1680, 0.8407, 0.0509]])
         [0.0804, 0.9705, 0.0030]])
</pre>
</pre>

Latest revision as of 17:51, 13 August 2024

Apptainer is a container platform that doesn't elevate the privileges of a user running the container. This is important as UMIACS runs many multi-tenant hosts (such as Nexus) and doesn't provide administrative control to users on them. While Docker is popular, the most typical setups require a daemon that has administrative level privileges that makes it not tenable.

Apptainer was previously branded as Singularity. You should still be able to run commands on the system with singularity, however you should start migrating to using the apptainer command.

Overview

You can find out what the current version is that we provide by running the apptainer --version command. If this instead says apptainer: command not found and you are using a UMIACS-supported host, please contact staff and we will ensure that the software is available on the host you are looking for it on.

# apptainer --version
apptainer version 1.2.5-1.el8

Apptainer can run a variety of images including its own format and Docker images. To create images from definition files, you need to have administrative rights. You will need to either use Podman to accomplish this on UMIACS-supported hosts, or alternatively do this on a host that you have full administrative access to (laptop or personal desktop) rather than a UMIACS-supported host.

If you are going to pull large images, you may run out of space in your home directory. We suggest you run the following commands to setup alternate cache and tmp directories. We are using /scratch0 but you can substitute any large enough local scratch directory, network scratch directory, or project directory you would like.

export WORKDIR=/scratch0/$USER
export APPTAINER_CACHEDIR=${WORKDIR}/.cache
export APPTAINER_TMPDIR=${WORKDIR}/.tmp
mkdir -p $APPTAINER_CACHEDIR
mkdir -p $APPTAINER_TMPDIR


We do suggest you pull images down into an intermediate file (SIF file) as you then do not have to worry about re-caching the image.

$ apptainer pull cuda12.2.2.sif docker://nvidia/cuda:12.2.2-base-ubi8
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob d5d706ce7b29 done
Copying blob b4dc78aeafca done
Copying blob 24a22c1b7260 done
Copying blob 8dea37be3176 done
Copying blob 25fa05cd42bd done
Copying blob a57130ec8de1 done
Copying blob 880a66924cf5 done
Copying config db554d658b done
Writing manifest to image destination
Storing signatures
2022/10/14 10:31:17  info unpack layer: sha256:25fa05cd42bd8fabb25d2a6f3f8c9f7ab34637903d00fd2ed1c1d0fa980427dd
2022/10/14 10:31:19  info unpack layer: sha256:24a22c1b72605a4dbcec13b743ef60a6cbb43185fe46fd8a35941f9af7c11153
2022/10/14 10:31:19  info unpack layer: sha256:8dea37be3176a88fae41c265562d5fb438d9281c356dcb4edeaa51451dbdfdb2
2022/10/14 10:31:20  info unpack layer: sha256:b4dc78aeafca6321025300e9d3050c5ba3fb2ac743ae547c6e1efa3f9284ce0b
2022/10/14 10:31:20  info unpack layer: sha256:a57130ec8de1e44163e965620d5aed2abe6cddf48b48272964bfd8bca101df38
2022/10/14 10:31:20  info unpack layer: sha256:d5d706ce7b293ffb369d3bf0e3f58f959977903b82eb26433fe58645f79b778b
2022/10/14 10:31:49  info unpack layer: sha256:880a66924cf5e11df601a4f531f3741c6867a3e05238bc9b7cebb2a68d479204
INFO:    Creating SIF file...
$ apptainer inspect cuda12.2.2.sif
...
maintainer: NVIDIA CORPORATION <sw-cuda-installer@nvidia.com>
name: ubi8
org.label-schema.build-arch: amd64
org.label-schema.build-date: Wednesday_24_January_2024_13:53:0_EST
org.label-schema.schema-version: 1.0
org.label-schema.usage.apptainer.version: 1.2.5-1.el8
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: nvidia/cuda:12.2.2-base-ubi8
...

Now you can run the local image with the run command or start a shell with the shell command.

  • Please note that if you are in an environment with GPUs and you want to access them inside the container you need to specify the --nv flag. Nvidia has a very specific driver and libraries that are required to run CUDA programs, so this is to ensure that all appropriate devices are created inside the container and that these libraries are made available in the container .
$ apptainer run --nv cuda12.2.2.sif nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-8e040d17-402e-cc86-4e83-eb2b1d501f1e)
GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-d681a21a-8cdd-e624-6bf8-5b0234584ba2)

Nexus Containers

In our Nexus environment we have some example containers based on our pytorch_docker project. These can be found in /fs/nexus-containers/pytorch.

You can just run one of the example images by doing the following (you should have already allocated a interactive job with a GPU in Nexus). It will use the default script found at /srv/tensor.py within the image.

$ hostname && nvidia-smi -L
tron38.umiacs.umd.edu
GPU 0: NVIDIA RTX A4000 (UUID: GPU-4a0a5644-9fc8-84b4-5d22-65d45ca36506)
$ apptainer run --nv /fs/nexus-containers/pytorch/pytorch_1.13.0+cu117.sif
99 984.5538940429688
199 654.1710815429688
299 435.662353515625
399 291.1429138183594
499 195.5575714111328
599 132.3363037109375
699 90.5206069946289
799 62.86213684082031
899 44.56754684448242
999 32.466392517089844
1099 24.461835861206055
1199 19.166893005371094
1299 15.6642427444458
1399 13.347112655639648
1499 11.814264297485352
1599 10.800163269042969
1699 10.129261016845703
1799 9.685370445251465
1899 9.391674041748047
1999 9.19735336303711
Result: y = 0.0022362577728927135 + 0.837898313999176 x + -0.0003857926349155605 x^2 + -0.09065020829439163 x^3

Bind Mounts

To get data into the container you need to pass some bind mounts. Apptainer containers will not automatically mount data from the outside operating system other than your home directory. Users need to manually bind mounts for other file paths.

--bind /fs/nexus-scratch/<USERNAME>/<PROJECTNAME>:/mnt

In this example, we will exec an interactive session with GPUs and binding our Nexus scratch directory which allows us to specify the command we want to run inside the container.

apptainer exec --nv --bind /fs/nexus-scratch/username:/fs/nexus-scratch/username /fs/nexus-containers/pytorch/pytorch_1.13.0+cu117.sif bash

You can now write/run your own pytorch python code interactively within the container or just make a python script that you can call directly from the apptainer exec command for batch processing.

Shared Containers

Portable images called Singularity Image Format or .sif files can be copied and shared. Nexus maintains some shared containers in /fs/nexus-containers. These are arranged by the application(s) that are installed.

Docker Workflow Example

We have a pytorch_docker example workflow using our GitLab as a Docker registry. You can clone the repository and further customize this to your needs. The workflow is:

  1. Run Docker on a laptop or personal desktop on to create the image, or use Podman on a UMIACS-supported system.
  2. Tag the image and and push it to your repository (this can be any docker registry)
  3. Pull the image down onto one of our workstations/clusters and run it with your data.
$ apptainer pull pytorch_docker.sif docker://registry.umiacs.umd.edu/derek/pytorch_docker
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob 85386706b020 done
...
2022/10/14 10:58:36  info unpack layer: sha256:b6f46848806c8750a68edc4463bf146ed6c3c4af18f5d3f23281dcdfb1c65055
2022/10/14 10:58:43  info unpack layer: sha256:44845dc671f759820baac0376198141ca683f554bb16a177a3cfe262c9e368ff
INFO:    Creating SIF file...
$ apptainer exec --nv pytorch_docker.sif python3 -c 'from __future__ import print_function; import torch; print(torch.cuda.current_device()); x = torch.rand(5, 3); print(x)'
0
tensor([[0.3273, 0.7174, 0.3587],
        [0.2250, 0.3896, 0.4136],
        [0.3626, 0.0383, 0.6274],
        [0.6241, 0.8079, 0.2950],
        [0.0804, 0.9705, 0.0030]])