Nexus/Apptainer

From UMIACS
Revision as of 18:07, 29 January 2024 by Mbaney (talk | contribs)
Jump to navigation Jump to search

Running containers in a multi-tenant environment has a number of security considerations. While Docker is popular, the most typical setups require a daemon that has administrative level privileges that makes it not tenable. There has been a lot of work in this area but ultimately for HPC environments, Apptainer is a solution that enables the capabilities of container workloads in multi-tenant environments.

The one consideration is that to create an image from a definition file you need to have administrative rights on the machine. For this reason you can't directly create Apptainer images on our supported systems. You can download or pull images from other repositories including the Docker repositories.

Bind Mounts

Apptainer containers will not automatically mount data from the outside operating system other than your home directory. Users need to manually bind mounts for other file paths.

--bind /fs/nexus-scratch/username/project1:/mnt

In this scenario we are binding the directory outside the container /fs/nexus-scratch/username/project1 to exist in the path /mnt inside the container.

Shared Containers

Portable images called Singularity Image Format or .sif files can be copied and shared. Nexus maintains some shared containers in /fs/nexus-containers. These are arranged by the application(s) that are installed.

GPUs

Nvidia has a very specific driver and libraries that are required to run CUDA programs. To ensure that all appropriate devices are created inside the container and that these libraries are made available in the container users need to use the --nv flag when instantiating their container(s).

Example

If you have the following example file in /fs/nexus-scratch/username/apptainer.

#!/usr/bin/env python

import torch;

print(f'Torch cuda is available: {torch.cuda.is_available()}')
print(f'Torch cuda number of devices: {torch.cuda.device_count()}')
for g in range(torch.cuda.device_count()):
    print(f'Torch cuda device {g}: {torch.cuda.get_device_name(0)}')
$ apptainer exec --bind /fs/nexus-scratch/username/apptainer:/mnt --nv /fs/nexus-containers/pytorch/pytorch_1.10.2+cu113.sif python3 /mnt/test.py
Torch cuda is available: True
Torch cuda number of devices: 1
Torch cuda device 0: NVIDIA RTX A4000